CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Break up Data Source Prep Scripts into individual scripts in new "Data Source Prep" folder #116

Closed maxachis closed 3 years ago

maxachis commented 3 years ago

Right now they follow a nomenclature of "DATE_prep_sources_AUTHOR_NAME", which is confusing. This should probably be changed to "prep_sources_SOURCE_NAME", to avoid confusion.

See title and most recent issues for what needs to be done next. @cgmoreno and @hellonewman offering to do this right now.

hellonewman commented 3 years ago

this is smart...why didn't I think of this

maxachis commented 3 years ago

Because we're a TEAM and being a part of a TEAM means we DIVIDE BRAINPOWER through HORRIFIC METHODS OF SCIENCE to ensure MAXIMAL EFFICIENCY TOWARDS OUR GRIM GOALS.

At any rate, I made the changes. That might be all we need to do, but I'd want to check in with folks like @cgmoreno to get a sense of whether this is all we would need to do.

May also be worth thinking about whether some QoL changes would be useful. Would we want to add a workflow for prepping data sources? Would we want to split up data prep sources so that we don't have multiple sources in the same script, or alternatively consolidate them so that they're all run from one script, or not touch it at all because god knows we don't want to fix what ain't broke?

maxachis commented 3 years ago

Per Ellie recommendation: Change scripts so that each data prep script is from a separate source. Archive old ones and make new versions so that each script is for only one data prep source.

maxachis commented 3 years ago

So create folder for individual source prep scripts, and then we have a workflow that scans everything in that folder and preps the data sources from that. Language agnostic. Archive the old ones.

maxachis commented 3 years ago

Remaining thing, I believe, is to add a component to the workflow that runs the prep scripts from within the prep scripts folder.

maxachis commented 3 years ago

So add a component to run.sh that runs the prep scripts from within the prep scripts folder.

maxachis commented 3 years ago

We need two sub-scripts, one for sourcing the python scripts, and one for sourcing the R scripts.

@conorotompkins Has generously/foolishly volunteered to assist with this, with the R script side. @oscarsyu Has also volunteered as well, with the Python script side.

conorotompkins commented 3 years ago

the R scripts I see in the main branch that need to be brought under this process are:

@hellonewman @maxachis is that accurate?

hellonewman commented 3 years ago

@conorotompkins yep I believe so.

conorotompkins commented 3 years ago

created this PR for this issue: https://github.com/CodeForPittsburgh/food-access-map-data/pull/132

maxachis commented 3 years ago

Current status on this:

Prep source scripts from Oscar and Conor integrated into run.sh, but errors currently popping up in execution. Right now, errors appear to be located in source_py_scripts.py:

image

(note that the same thing happened with prep_FMNP.py, and we attempted to remove that from the script and rerun it. Once we resolve this issue, we should re-add prep_FMNP.py)

oscarsyu commented 3 years ago

I don't know about how github handles relative file paths but I noticed the difference between connor's and my script is that I don't list out data_prep_scripts/prep_source_scripts/ before each file and I'm thinking it's possible the working directory isn't set correctly so it can't find the scripts to run. What if we changed open(filename, "fb") as source_file: to open('data_prep_scripts/prep_source_scripts/' + filename, "rb") as source_file?

hellonewman commented 3 years ago

Re-running to see if Conor fixed his grow_pgh.R script successfully. If so we can close this

maxachis commented 3 years ago

Did the final tweaks and uploaded the changes to Master. This is now done!