datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

Error when pipeline infers resource schema automatically #99

Closed AcckiyGerman closed 6 years ago

AcckiyGerman commented 6 years ago

Acceptance criteria

Analysis

We implemened allow infer schema for resources feature, which is very cool and reduces manual work when automating a dataset.

But when the pipeline infers resource schema automatically, this schema does not pass the validator.

Examples

Auto source schema infer in the 'GDP' dataset: https://github.com/datasets/gdp/blob/2e41b0b79005f51269826eb7f29df8636dbbbc9f/.datahub/flow.yaml#L7

inputs:
  -
    kind: datapackage
    parameters:
      resource-mapping:
        gdp: http://api.worldbank.org/indicator/NY.GDP.MKTP.CD?format=csv

The source csv header has a redundant comma.

"2015","2016","2017",

Result: https://testing.datahub.io/AcckiyGerman/gdp image

zelima commented 6 years ago

This is not a bug related to us, but the source data is just invalid according to the schema (it should not have a blank header).

Question to answer here would be:

AcckiyGerman commented 6 years ago

Yes, the source csv header

"2015","2016","2017",

has redundant comma

zelima commented 6 years ago

WONTFIX this is not related to us