Open chriszs opened 3 months ago
Great Expectations is apparently only compatible with Python 3.8 and up, so I removed 3.7 from the CI matrix for demonstration purposes.
Also, believe updating Pipfile.lock when I added GE may have also upgraded some non-pinned deps. Flake8 is now at 7.0, which has at least one incompatibility with current version in pre-commit (so probably should upgrade the version in pre-commit or pin the Pipfile version).
There's a 1.0 version of GE now in pre-release, which seems like it will move stuff around (but isn't well-documented yet), so I locked it to the current point release.
This PR is a work-in-progress draft of a potential command to validate raw data using Great Expectations. It creates an expectation suite that checks if each raw CSV has three or more rows and then opens an HTML report listing the results.
This is very much a first effort, and we would probably want to factor it a little differently if we decided to use it.
Usage
The following should validate CSVs in the default raw directory used by warn-transformer, verifying that each has three or more rows, creating a data quality report in a temporary directory and opening it in a browser (obviously we'd want to persist it somewhere and/or alert off of it in production):
Screenshots
In this example, I hand-edited ak.csv to fail the check:
Detail on the failure:
Related to #236