CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Add post-merged dataset sanity checking script #153

Closed maxachis closed 2 years ago

maxachis commented 3 years ago

Basically, check to make sure the merged_dataset.csv file:

And if those conditions aren't met, throw an error and don't add/commit.

Have this run within the rest of run.sh, after all the data prep scripts are occurring but before the final add/commit.

hellonewman commented 3 years ago

Mark Z and Max will take a look at this.

maxachis commented 2 years ago

Before I go any further with this script, I want to confirm that the merged_dataset.csv isn't missing any source orgs or source files currently. If not, my script will check to make sure the dataset contains the following:

source_org 1 Grow Pittsburgh
2 USDA Food and Nutrition Service
3 PA WIC
4 Allegheny County
5 FMNP Markets
6 Greater Pittsburgh Community Food Bank
7 Just Harvest

source_file
1 GP_garden_directory_listing-20210322.csv
2 https://services1.arcgis.com/RLQu0rK7h4kbsBq5/arcgis/rest/services/Store_Locations/FeatureServer
3 wicresults.json
4 https://services1.arcgis.com/vdNDkVykv9vEWFX4/arcgis/rest/services/Child_Nutrition/FeatureServer
5 https://services5.arcgis.com/n3KaqXoFYDuIhfyz/ArcGIS/rest/services/FMNPMarkets/FeatureServer
6 https://services1.arcgis.com/vdNDkVykv9vEWFX4/arcgis/rest/services/COVID19_Food_Access_(PUBLIC)/FeatureServer
7 Just Harvest Google Sheets

maxachis commented 2 years ago

I may also want to add sanity checking to ensure that each flag column contains both 0's and 1's. They don't all have to be 0's and 1's -- I can see some scenarios where something being "NA" is fine, but we'd probably avoid problems by simply ensuring that any flag column isn't ALL NA's.

maxachis commented 2 years ago

Ellie gave the approval for all of the above, so Max, go ahead and put together this sanity checking script with the given parameters!

maxachis commented 2 years ago

I have created a pull request for this!

https://github.com/CodeForPittsburgh/food-access-map-data/pull/206

In addition to the logic of the sanity checking script, I should note that commands for adding and pushing to Git have been moved to the run.sh shell script from the Github Actions yaml file for generate_merged_datasets, so that they can properly be controlled by the sanity checking script.

At any rate, have a look at it and let me know if it looks good for merging!