Closed SimonGreenhill closed 3 years ago
for the record, the file I noticed this in had a damn smart quote intended as the end quote, which of course is invalid, but damn hard to debug:
1,"i am an unclosed comment”
I'd still say we do the right thing - i.e. falling back to what python's csv
module does. So rather than introducing dubious heuristics to infer that something fishy is going on (like pandas or R might do :) ), I'd keep this as is.
I think that problem is inherent in storing data in text files (same with NEXUS, etc.): As soon as you start to use the format also for non-traditional-tabular-content - such as full gene sequences or images serialized as data URLs - and violate the assumption that table cells will be small, the advantage of text - namely that it can be inspected by looking at - breaks down.
yeah, I can only think of very brittle ways to fix this, so let's wontfix
for now.
I think the proper way to fix this is csvw (or similar): add metadata to your csv to inform the parsing.
Simon J Greenhill notifications@github.com schrieb am Fr., 5. Feb. 2021, 09:29:
Closed #52 https://github.com/cldf/csvw/issues/52.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cldf/csvw/issues/52#event-4294908608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKG2JGDAIBV3Q67RQA3S5OT5JANCNFSM4XDO4C3Q .
If a CSV file contains a quoted field with an unclosed quote e.g.:
...the reader will combine all subsequent content until the end of file or the field limit, whichever comes first. That is, row 1, column 'Content` becomes
"i am an unclosed comment\n2,Lorem ipsum dol...
either generating a very large field, or raising:
This is an issue with a malformed csv file, and I don't see an easy way to solve this, so this issue is primarily to document the problem for anyone else who comes across it.