datopian / datahub-qa

:package: Bugs, issues and suggestions for datahub.io
https://datahub.io/
32 stars 6 forks source link

[Push] Cannot publish tabular data with values that have double quotes #66

Closed anuveyatsu closed 6 years ago

anuveyatsu commented 6 years ago

@Mikanebu commented on Thu Feb 01 2018

Steps to reproduce

Output

got error "Invalid opening quote" - this is because we need to have escape character setup

Expected behaviour

anuveyatsu commented 6 years ago

This is now INVALID as if you try to reproduce, it wouldn't fail as before. In general, if users want to have double quotes " in the values, e.g., when a value is a hash {"a":1} then it should be enclosed in single quotes.

AcckiyGerman commented 6 years ago
user@pc:~$ data-linux push https://github.com/frictionlessdata/test-data/blob/master/files/csv/all-schema-types.csv
> Error! Invalid opening quote at line 8
user@pc:~$ node work/datahq/data-cli/bin/data.js push https://github.com/frictionlessdata/test-data/blob/master/files/csv/all-schema-types.csv
> Error! Invalid opening quote at line 8
user@pc:~$ node work/datahq/data-cli/bin/data.js -v
0.6.7

data-cli is up to date with github master branch, npm i is also done.

AcckiyGerman commented 6 years ago

@anuveyatsu ^^^

anuveyatsu commented 6 years ago

@AcckiyGerman have you pulled latest "test-data"?

anuveyatsu commented 6 years ago

Just realised that double quotes in values should be used with escape character (and escape character by default should be also double quotes) so, e.g.:

{"a": 1}

should become:

"{""a"": 1}"
AcckiyGerman commented 6 years ago

that all becomes complicated, I suggest you to make a list of rules like

AcckiyGerman commented 6 years ago

coz otherwise It can become very compex - did you ever seen the json file encoded in the url string ? :smile:

AcckiyGerman commented 6 years ago

or probably you can use json.dumps() for that

anuveyatsu commented 6 years ago

@AcckiyGerman I think it's a common situation when you have double quotes in values. By default, " is used as the escape character, e.g., in this library http://csv.adaltas.com/parse/#parser-options (we're currently using it) and also if you try to export data from Excel or Google Spreadsheets as CSV you would get the same result. Considering these points, I think we should have " as default escape character in dialect for tabular resources.

zelima commented 6 years ago

Agree. You can escape \" or set different escape character and push that way.

AcckiyGerman commented 6 years ago

TESTED: FAILED data push all-schema-types.csv cause:

AcckiyGerman commented 6 years ago

@anuveyatsu The push is OK, so we could close this issue, but first create an issue about PUBLISH FAIL

zelima commented 6 years ago

@AcckiyGerman Just a tip: posting links to the Failed revisions do not really help as we can not see unless logged in. Could we switch posting screenshots instead in cases like this.

AcckiyGerman commented 6 years ago

@zelima sure. I don't know why I was sure that you can read related logs from the backend. But even if so, it will be easier to read logs in the message than lurking on the backend

zelima commented 6 years ago

@AcckiyGerman Although you won't be able to push this exact file due to #98, this one will be fixed in data 0.7.7.

I created Gist and removed yearmonth and geopoint types from there, leaving double quote's column as is. You can try

data push  https://gist.githubusercontent.com/zelima/d9a3d99b7ca41e632c8b3d7853d543df/raw/be2058d74248e219e430ba72ed5de0cbeb005aaf/types.csv

Or take a look at already published package https://datahub.io/zelima/schema/v/27

AcckiyGerman commented 6 years ago

TESTED & FIXED