frictionlessdata / data-quality-spec

A spec for reporting errors in data quality.
MIT License
20 stars 3 forks source link

Reason to use numeric keys and nesting? #3

Closed roll closed 7 years ago

roll commented 8 years ago

@pwalsh I'm starting to work on goodtables.next more closely and this spec is a great source for me. But I have a few questions.

First is why to use this numbered keys instead of unique identifier of error. I don't fully understand where this numbers could be used because showing to user things like error: structure-3 in my opinion less usable then something like error: structure-duplicate-row. We have an experience working with linters and I suppose all don't like when they starting to answer to us with some strange numeric codes (ok if you have 200 different error it could be explained).

Second is format/structure/schema nesting. I see here also room for simplification. We have like around 30 errors I suppose we could to write it just is simplest dict form like:

unknown-format:
    type: format
    ...
unknown-encoding:
    type: format
    ...
blank-header:
    type: structure
    ...
missing-headers:
    type: schema
    ...
pattern-constraint:
   type: schema

It also allow to introduce some unified language between users and implementations e.g. there will be check_missing_headers or checkPatternConstraint (in js). Or in some validator - validate(checks={'missing-headers': True}).

Also do you plan tools could use this spec programaticaly? If yes then my comments are even more actual I suppose.

pwalsh commented 8 years ago

@roll all sounds fine. I did a PR and had no comments. I based it on React's recent error codes spec: https://github.com/facebook/react/blob/master/scripts/error-codes/codes.json

pwalsh commented 8 years ago

btw i also had a version without the nesting, and i guess i changed it for readability only, which is probably not a good decision considering we really only need this for programmatic use.

roll commented 8 years ago

@pwalsh Sorry I just wasn't fully involved to comment earlier. I suppose we could use some modified local copy of this spec in goodtables.next codebase, discuss, check how it works and then merge this experience in this spec if needed.

About react - I think they just don't have the luxury we have - to use short unique identifiers to name errors (they have 100+). So I suppose they have been just pushed to use number keys.

pwalsh commented 8 years ago

I can imagine another 10-20 error types are possible ..... but yes, we will not get to 100.

pwalsh commented 8 years ago

For anyone following @roll is testing out a different data structure for the spec here