datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
193 stars 39 forks source link

An ability to add goodtables checks to the validate processor #142

Open roll opened 4 years ago

roll commented 4 years ago

Overview

From goodtabes@3 (now early alpha) it will have a function like system.create_check('baseline/integrity/etc') returning a check object with check.validate_headers/row/table available. The amount of checks have been drastically reduces (only 2 core checks and 6 advanced).

So we will be able to integrate it into the validate processor like (maybe adding core checks baseline/integrity by default):

Flow(
    validate(extra_checks=['baseline', 'integrity', ('blacklisted-value', {'fieldName': 'name'}), ...]
)

As an output, I think we just can add the errors list to a resource descriptor with a list of found errors.

Initially dataflows-goodtables integration is a @cschloer's idea but, for now, I don't see how we can friend them due to their streaming-nature without going to lower-level (the level of the individual check).

cschloer commented 3 years ago

Hey, is our best bet on this just to wait until frictionless-py is fully implemented and dataflows has migrated?

roll commented 3 years ago

Yea. I think so. Now this atomic checks architecture works with Frictionless so we can apply checks on streams (as a dataflows processor without consuming a stream separately)