Clever / csvlint

library and command line tool that validates a CSV file
Apache License 2.0
186 stars 18 forks source link

UTF-8 CSV files with BOM aren't parsed correctly if the first header field contains quotes #37

Open datatraveller1 opened 1 year ago

datatraveller1 commented 1 year ago

I have a CSV file encoded with UTF8-BOM:

"first_column","second_column"
"Hello","how are you"

This is a correct CSV file but there is the result:

Record #0 has error: bare " in non-quoted-field

The issue happens with an UTF-8 with BOM encoded file if the first header field is surrounded by quotes.

Suggestion: This could be solved by removing the UTF-8 BOM in the header line: Pseudocode: if (line_number == 1) { sub(/^\xef\xbb\xbf/, "", line) }

datatraveller1 commented 1 year ago

I have just noticed someone else posted nearly the same issue (https://github.com/Clever/csvlint/issues/21) but this simple fix (removing the UTF-8 BOM in your csvlint code) would help to succeed the csvlint check.