edgi-govdata-archiving / ECHO_modules

ECHO_modules is a Python package for analyzing a copy of the US Environmental Protection Agency's (EPA) Enforcement and Compliance History Online (ECHO) database
GNU General Public License v3.0
3 stars 6 forks source link

add 118th Congressional District information to ECHO_EXPORTER in SBU database #74

Open ericnost opened 7 months ago

ericnost commented 7 months ago

A problem with ECHO_EXPORTER's CD113 field has been that there are many facilities with invalid CDs.

This script only produces real CDs for 118, an improvement on ECHO's data:

https://colab.research.google.com/drive/1y73SNnrd--Qp5m5SAPsLlIwyBQMCapkp

To make it part of the SBU database workflow, the script would have to be adjusted to load the scraped ECHO_EXPORTER CSV. It would probably be called in between the scrape and the import here: https://github.com/sunggheel/edgipgdb/blob/main/edgi_postgis/echo_scripts/10_scrapeECHOEPA

Additionally, we'd have to adjust the schema (one-time manual change?) and the views->materialized script here https://github.com/sunggheel/edgipgdb/blob/6afa24d4b5ec8207619eb5db1936892e5179790b/edgi_postgis/echo_scripts/viewsToMaterialized.py#L38 to include the new field.

ericnost commented 2 months ago

We've at least devised a way to make this information available for the report card process.