Currently, we output up to 30 or so different CSVs for each state/CD in order to feed the R markdown script. That's unwieldy for 1 CD and a huge mess for 75+.
Ideally, we would output 1 CSV that contains all results. It would look something like this:
geography
violations_per1000_CWA_district
violations_per1000_CAA_district
violations_per1000_CWA_state
violations_per1000_CAA_state
etc.
WA-02
8
15
10
20
IN
NA
NA
5
50
etc.
This would require two additional things:
creating variables to store the data until a final "Create Your Output" cell. Essentially we would just have a cell at the beginning of the notebook (or even in the utilities.py) that looks something like this:
violations_per1000_CWA_district = NA
violations_per1000_CAA_district = NA
violations_per1000_CWA_state = NA
violations_per1000_CAA_state = NA
etc.
Then associate this with however many geographies are to be run:
In the cells that currently do the calculations, instead of writing to CSV, just store the result:
for geography in ["WA-02", "IN", ...]:
# do calculation
output[geography]["violations_per1000_CWA_district"] = result
Finally, in the "Create Your Output" cell, print the results to CSV rows:
csv.writer.writerow(["geography", "violations_per1000_CWA_district", etc.]) # write the header
for geography in list(output.keys()):
csv.writer.writerow(output[geography])
re-tooling the RMarkdown file in order to accept just one CSV input and use R's field identifiers ($) to get the data.
Currently, we output up to 30 or so different CSVs for each state/CD in order to feed the R markdown script. That's unwieldy for 1 CD and a huge mess for 75+.
This would require two additional things:
Then associate this with however many geographies are to be run:
In the cells that currently do the calculations, instead of writing to CSV, just store the result:
Finally, in the "Create Your Output" cell, print the results to CSV rows: