Reconfigure AllPrograms outputs

ericnost commented 3 years ago

Currently, we output up to 30 or so different CSVs for each state/CD in order to feed the R markdown script. That's unwieldy for 1 CD and a huge mess for 75+.

Ideally, we would output 1 CSV that contains all results. It would look something like this:	geography	violations_per1000_CWA_district	violations_per1000_CAA_district	violations_per1000_CWA_state
WA-02	8	15	10	20
IN	NA	NA	5	50
etc.

This would require two additional things:

creating variables to store the data until a final "Create Your Output" cell. Essentially we would just have a cell at the beginning of the notebook (or even in the utilities.py) that looks something like this:

violations_per1000_CWA_district = NA
violations_per1000_CAA_district = NA
violations_per1000_CWA_state = NA
violations_per1000_CAA_state = NA
etc.

Then associate this with however many geographies are to be run:

output = {"WA-02": [violations_per1000_CWA_district, violations_per1000_CWA_state, violations_per1000_CAA_district, violations_per1000_CAA_state ...], "IN": [....], etc.}

In the cells that currently do the calculations, instead of writing to CSV, just store the result:

for geography in ["WA-02", "IN", ...]:
# do calculation
output[geography]["violations_per1000_CWA_district"] = result

Finally, in the "Create Your Output" cell, print the results to CSV rows:

csv.writer.writerow(["geography", "violations_per1000_CWA_district", etc.]) # write the header
for geography in list(output.keys()):
csv.writer.writerow(output[geography])

re-tooling the RMarkdown file in order to accept just one CSV input and use R's field identifiers ($) to get the data.

Frijol commented 2 years ago

Is this still needed given the new reports flow @shansen5 ?

shansen5 commented 2 years ago

This issue is no longer needed. The AllPrograms functionality is now in the EEW-ReportCard-Data repository.

edgi-govdata-archiving / ECHO-Cross-Program

Reconfigure AllPrograms outputs #113