frictionlessdata / frictionless-js

A lightweight, standardized library accessing files and datasets, especially tabular ones (CSV, Excel).
https://frictionlessdata.io
71 stars 8 forks source link

Setting escapcahr and quotechar to double quote acts unexpected #31

Closed zelima closed 6 years ago

zelima commented 6 years ago

@anuveyatsu what is the reason using double quotes as escapechar, besides csv-parser using it as default? https://github.com/datahq/data.js/commit/398235d71302edffc9467a462f039c5c7bffeef0

Cause in python by default it's None https://docs.python.org/2/library/csv.html#csv.Dialect.escapechar

Cause it acts really weird if both, escapechar and quotechar are set to ". Eg take a look at this validation report: https://pkgstore.datahub.io/81429cbbddcfb180f54c142fac32f83b/schema/validation_report/data/874d49bd554630f5b536216ce390d4d9/validation_report.json It thinks that everything after "{\"one\"... is one column. Same will happen even if it's quoted in a simple way, like "one, two"

I've just deleted that line from data.js and pushed, it's processed successfully https://datahub.io/zelima/schema/v/20

Do you think we can remove it?

Ans from @anuveyatsu

@zelima the reason IMO is that it is very common, e.g., go to google spreadsheet create a table with in values. Then export it as CSV and you’d see that ” was used as the escape char. I haven’t tried the same operation with Excel, but read in the web that it is the same.

Update

So think we don't really need to define it (and especially hardcode it to be ").

zelima commented 6 years ago

FIXED. We allow guessing the escapechar to the processing libraries (tabulator) in python for now. Issue for guessing it here #33