edgi-govdata-archiving / ECHO-Cross-Program

Jupyter Notebooks for ECHO that use data from multiple EPA programs
https://colab.research.google.com/github/edgi-govdata-archiving/ECHO-Cross-Program/blob/master/ECHO-Cross-Programs.ipynb
GNU General Public License v3.0
8 stars 5 forks source link

Reconfigure AllPrograms outputs #113

Closed ericnost closed 2 years ago

ericnost commented 3 years ago

Currently, we output up to 30 or so different CSVs for each state/CD in order to feed the R markdown script. That's unwieldy for 1 CD and a huge mess for 75+.

Ideally, we would output 1 CSV that contains all results. It would look something like this: geography violations_per1000_CWA_district violations_per1000_CAA_district violations_per1000_CWA_state violations_per1000_CAA_state etc.
WA-02 8 15 10 20
IN NA NA 5 50
etc.

This would require two additional things:

  1. creating variables to store the data until a final "Create Your Output" cell. Essentially we would just have a cell at the beginning of the notebook (or even in the utilities.py) that looks something like this:
    violations_per1000_CWA_district = NA
    violations_per1000_CAA_district = NA
    violations_per1000_CWA_state = NA
    violations_per1000_CAA_state = NA
    etc.

    Then associate this with however many geographies are to be run:

    output = {"WA-02": [violations_per1000_CWA_district, violations_per1000_CWA_state, violations_per1000_CAA_district, violations_per1000_CAA_state ...], "IN": [....], etc.}

    In the cells that currently do the calculations, instead of writing to CSV, just store the result:

    for geography in ["WA-02", "IN", ...]:
    # do calculation
    output[geography]["violations_per1000_CWA_district"] = result

    Finally, in the "Create Your Output" cell, print the results to CSV rows:

    csv.writer.writerow(["geography", "violations_per1000_CWA_district", etc.]) # write the header
    for geography in list(output.keys()):
    csv.writer.writerow(output[geography])
  2. re-tooling the RMarkdown file in order to accept just one CSV input and use R's field identifiers ($) to get the data.
Frijol commented 2 years ago

Is this still needed given the new reports flow @shansen5 ?

shansen5 commented 2 years ago

This issue is no longer needed. The AllPrograms functionality is now in the EEW-ReportCard-Data repository.