Closed maxachis closed 3 years ago
Added branch to start work on this in https://github.com/CodeForPittsburgh/food-access-map-data/tree/add_data_integrity_checker_2021_07_18
This will likely require not merely a new script, but some modifications to run.sh as well. Specifically, if the data integrity checker says, for whatever reason, "Yo, this data ain't integritatious", we need to stop the rest of the script from running as well.
This would just require that the data integrity checker return a boolean in the script, and add an if condition in run.sh that, if the boolean is FALSE, stops the rest of the script from running.
From my understanding so far, it would probably look something like this:
Data Integrity Checker added per Pull Request #158
Add a new script, to be applied within run.sh before agg_clean_data but after source_R_scripts and source_py_scripts that checks the integrity of our source datasets before merging them via the run.sh workflow.
Things to check for include:
This one we can also make unit tests for.