aswinkarthik / csvdiff

A fast diff tool for comparing csv files
https://aswinkarthik.github.io/csvdiff/
MIT License
532 stars 57 forks source link

Wrong number of fields #55

Open filipecatraia opened 2 years ago

filipecatraia commented 2 years ago
"01dd672d-e078-46c5-ae4a-6cc284125664", "zoe", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36", "TtYuBV9fFy6HTlVl7AOVhuoZmH9hhxRa", "20be9326b2a53b98773c227fe1f745d8", NULL, "2022-06-20 08:54:14.751721+00", "2022-09-20 08:54:14.751721+00", "2022-06-20 08:54:14.751721+00", "2022-06-20 08:54:14.751721+00"
"02c79b4a-d753-4397-9074-124b54fa7a47", "znhrbvqnxx", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0", "wu0IV4VwCtkJJ07TMEMp5bH6XdIq09MS", "4f01ab7686f5d680e256721cd53d2d45", NULL, "2022-09-05 10:03:25.568272+00", "2022-12-05 10:03:25.568272+00", "2022-09-05 10:03:25.568272+00", "2022-09-05 10:03:25.568272+00"

Taking the 2 rows above, and writing them both to demo1.csv and demo2.csv, the output of:

csvdiff dist/demo1.csv dist/demo2.csv --lazyquotes

is csvdiff: command failed - error processing base file: record on line 2: wrong number of fields.

(Note I'm writing the same data to both files.)

Without --lazyquotes the issue is csvdiff: command failed - error in base-file: parse error on line 1, column 41: bare " in non-quoted-field.

Any tips on what I'm doing wrong? This seems like valid, quoted CSV data.

Thanks a lot!

datatraveller1 commented 1 year ago

This isn't valid CSV. Instead of "01dd672d-e078-46c5-ae4a-6cc284125664", "zoe", please use no space between the comma and the first enclosing quote of the next field. Correct is: "01dd672d-e078-46c5-ae4a-6cc284125664","zoe". So, this will work:

"01dd672d-e078-46c5-ae4a-6cc284125664","zoe","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36","TtYuBV9fFy6HTlVl7AOVhuoZmH9hhxRa","20be9326b2a53b98773c227fe1f745d8",NULL,"2022-06-20 08:54:14.751721+00","2022-09-20 08:54:14.751721+00","2022-06-20 08:54:14.751721+00","2022-06-20 08:54:14.751721+00"
"02c79b4a-d753-4397-9074-124b54fa7a47","znhrbvqnxx","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0","wu0IV4VwCtkJJ07TMEMp5bH6XdIq09MS","4f01ab7686f5d680e256721cd53d2d45",NULL,"2022-09-05 10:03:25.568272+00","2022-12-05 10:03:25.568272+00","2022-09-05 10:03:25.568272+00","2022-09-05 10:03:25.568272+00"

To test for valid CSV according to RFC 4180, you can use https://github.com/Clever/csvlint, another great Go CSV tool.

BTW, the lazyquotes option (I don't know Go, but I think this is part of the Go standard library) is extremely buggy. I wouldn't recommend tu use it, even the source code examples fail with this option. I think I'll write a bug report about lazyquotes soon.