Open boydkelly opened 2 months ago
Parsing and detecting errors in this utility is handled by https://pkg.go.dev/encoding/csv#Reader
Which seems to complain if the quotes are not the first or last character in the field.
In your sample text is the double quoted field delimited by tabs as in dondon ko\t"ken ken kileri kɛ".\tdyu
?
Or is there whitespace before the leading quote as in dondon ko\t "ken ken kileri kɛ".\tdyu
?
Only the second case throws the error for me.
It certainly could be the second case. Since this is foreign language prose and not 'clean' text the expectation is that when it is defined as tab delimited then it should not matter if and where any quote may occur. So in your second example the text should 'properly' lint as with \t replaced by line feed:
dondon "ken ken kileri kɛ". dyu
So it looks like the bug is with csv#Reader?
I'm really just checking that the number of columns is accurate. And for now Awk will do the job, But it would be great to see tsv handled correctly here.
So it looks like the bug is with csv#Reader?
I'm not certain if its a bug or not, because the Reader docs are not explicit on tab delimited data.
-lazyquotes
may be an option in this case.
I'll just use awk. The whole point of tab delimiters is to avoid the numerous problems of quote delimiters. In a tab delimited file quotes should not be considered as anything but another string character. I guess csv#Reader is true to its name, comma separated. It does not understand tabs correctly.
When linting tsv files, I get:
The record 1035 is as follows. But since this is tsv (for this very reason) should any quoting not be totally ignored as an error?