edgi-govdata-archiving / ECHO_modules

ECHO_modules is a Python package for analyzing a copy of the US Environmental Protection Agency's (EPA) Enforcement and Compliance History Online (ECHO) database
GNU General Public License v3.0
3 stars 6 forks source link

create an aggregate by geography function #67

Closed ericnost closed 3 months ago

ericnost commented 10 months ago

Currently, we have no way of aggregating information by geographic unit (e.g. ZIP code or watershed). We can retrieve information at those levels, and display it, but we have nothing outside to summarize it at those units.

Here's what it might look like:

# Get attribute data
ds = make_data_sets(["CWA Violations"]) # Create a DataSet for handling the data
ny_zips_cwa_inspections = ds["CWA Violations"].store_results(region_type="Zip Code", region_value=zips, state = "NY") # Store results for this DataSet as a DataSetResults object

# Aggregate attribute data
from ECHO_modules.geographies import region_field
ny_zips_aggregated = ny_zips_cwa_inspections.dataframe.groupby(by=region_field[ny_zips_cwa_inspections.region_type]["field"])[[ny_zips_cwa_inspections.dataset.agg_col]].sum() 
ny_zips_aggregated

Doesn't seem too bad. This enables us to map values (inspections, violations, etc. by areal unit using choropleth()