CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Add data integrity checker to workflow #139

Closed maxachis closed 3 years ago

maxachis commented 3 years ago

Add a new script, to be applied within run.sh before agg_clean_data but after source_R_scripts and source_py_scripts that checks the integrity of our source datasets before merging them via the run.sh workflow.

Things to check for include:

This one we can also make unit tests for.

maxachis commented 3 years ago

Added branch to start work on this in https://github.com/CodeForPittsburgh/food-access-map-data/tree/add_data_integrity_checker_2021_07_18

maxachis commented 3 years ago

This will likely require not merely a new script, but some modifications to run.sh as well. Specifically, if the data integrity checker says, for whatever reason, "Yo, this data ain't integritatious", we need to stop the rest of the script from running as well.

This would just require that the data integrity checker return a boolean in the script, and add an if condition in run.sh that, if the boolean is FALSE, stops the rest of the script from running.

From my understanding so far, it would probably look something like this: image

maxachis commented 3 years ago

Data Integrity Checker added per Pull Request #158