frictionlessdata / data-quality-spec

A spec for reporting errors in data quality.
MIT License
20 stars 3 forks source link

Release version-1 #15

Closed roll closed 7 years ago

roll commented 7 years ago

Overview

The spec - https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json

My opinion that's from goodtables.io and goodtables.py side there are no more "questions" to the Data Quality Spec so it's pretty ready to be released as version 1 (release means to do not breaking changes before version 2). Also now we have filled all parts of the specs i.e. names, messages and descriptions.

Late call breaking changes

For now it will cost a few hours of work on goodtables/py/io level for any breaking changes. So let's make it now or "never" (for version 1). We're starting to collect goodtables reports. For now amount of reports is pretty small but it will be growing. And the spec has to be compatible to all of them.

For example I agree with @Stephen-Gates on #14 - non-castable-value was bothering me too - so may be it could be just a type-error. And now as said the last chance to make this change.

So please think what we must change now before locking the spec.

Editorial review

Before release and esp. announcement we have to clean error names, messages and descriptions language. It has been started - https://github.com/frictionlessdata/data-quality-spec/pull/13. Just please review the spec anyone who can.

Joining FrictionlessData specs family

It's aside but I have a strong feeling that this specs should be a part of FrictionlessData specs family and published on http://specs.frictionlessdata.io/ (so actual releasing of Data Quality Spec could be putting it to this site). I suppose It fits very well and covers a vital aspect of FrictionlessData project work - data quality. WDYT?

cc @pwalsh @akariv @danfowler @Stephen-Gates

Stephen-Gates commented 7 years ago

I notice there are no messages for Foreign Key checks. I assume this will come later?

If not, something like:

  "foreign-key": {
          "name": "Foreign Key",
          "type": "schema",
          "context": "body?",
          "weight": ?,
          "message": "The value {value} in row {row_number} and column {column_number} does not match a primary key in the {table} table",
          "description": "This field value should be equal to one of the primary key values in the related table.\n\n How it could be resolved:\n - If this value is not correct, update the value.\n - If value is correct, then add or correct the value in a related table.\n - If this error should be ignored disable `foreign-key` check in {validator}."
        }
roll commented 7 years ago

@pwalsh Main specs are released. I think Data Quality Spec should follow. We have to decide on https://github.com/frictionlessdata/data-quality-spec/issues/14 and it's ready.