datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

[import] guess csv dialect #87

Closed rufuspollock closed 6 years ago

rufuspollock commented 6 years ago

As a Publisher I want to be able to provide a CSV with semicolon delimiters or other csv variations and have the app handle this automatically so that I don't have to set this stuff by hand in a datapackage.json

See e.g. https://github.com/datahubio/qa/issues/35

Acceptance criteria

Tasks

Analysis

All cases failed, except comma separated file. Please, check url https://datahub.io/Mikanebu/test-data-for-different-separators/v/2.

Sample is located here: https://github.com/Mikanebu/qa-test-datasets Other samples: https://github.com/frictionlessdata/test-data/tree/master/data-files/csv/separators

Related issue: https://github.com/datahq/datahub-qa/issues/35

Also, if there is datapackage.json, that describes all files:

Error! The column header names do not match the field names in the schema

Variations

We want following delimiters to be supported: [',', ';', ':', '|', '\t', '^', '*', '&'] We also want to guess quote character along with delimiter - but still default to double quote

As per line endings, we don't want to do anything as CSV parser library we're using handles it internally.

Identify relevant place for its use

this should be implemented in data.js library:

anuveyatsu commented 6 years ago

Working delimiters:

anuveyatsu commented 6 years ago

FIXED, available from v0.7.0 of the CLI tool - try the latest version here http://datahub.io/download

AcckiyGerman commented 6 years ago

Manual test with multi-delimeters dataset (from test-data repo) FAILED

May be the test dataset is not valid, anyway we need to investigate.

AcckiyGerman commented 6 years ago

Now works, it was an invalid descriptor in a test dataset FIXED:
User could push datasets and files with different delimeters.