Open hjacobs opened 2 years ago
OK, apparently the problem is with only sniffing truncated data ([:1024]
) which can break the CSV sniffer algorithm as it tries to detect the delimiter by counting the occurrences on each line (and truncating in the middle of a line will therefore corrupt the data for the sniffer).
Changing the logic to sniff the first N lines instead of first 1024 characters would solve this issue.
hi,
and/or adding a --delim
option (or something similar) on the command line to force the definition (in case of detection problem for example)...
regards.
OK, apparently the problem is with only sniffing truncated data (
[:1024]
) which can break the CSV sniffer algorithm as it tries to detect the delimiter by counting the occurrences on each line (and truncating in the middle of a line will therefore corrupt the data for the sniffer).
Wow, is that the reason why?!? ๐ฒ I've been wondering for years why the code example in the official Python csv.Sniffer docs does not seem to work. I never realized it is because it breaks in the middle of a line. ๐คจ
Seems to me this should be fixed in the official Python docs as well, since I've never managed to get it to work...
Anyway, thanks for this gem! ๐
Is there any --delim
or similar option to force delim detection?
I just submitted a pull request to add --csv-format
that lets you set the dialect to use.
Rendering TSV (tab-separated values) works when passing a file name:
But it fails for the same file when passing as stdin (
-
) with error "Could not determine delimiter":Apparently the CSV/TSV sniffer does not work correctly and the detection via the file extension (
.tsv
) makes it work (excel-tab
dialect ofcsv
parser) when passing the file name, but not when passing the same data via stdin (-
).