catalyst-cooperative / rmi-energy-communities

Partnership between Catalyst and RMI to identify energy communities as defined by the Inflation Reduction Act
MIT License
4 stars 2 forks source link

Analyses on final dataframe #88

Closed katie-lamb closed 1 year ago

katie-lamb commented 1 year ago

This PR adds an output analysis module with a function that aggregates county records to get the number of brownfields, coal qualifying census tracts, and percentage of area qualified for each county.

For use by DBCP.

katie-lamb commented 1 year ago

Summary of aggregations for @TrentonBush :

This analysis takes the output dataframe of qualifying areas and aggregates it such that there is a record for each county that has a qualifying criteria within it. The input to this aggregation is a dataframe of all qualifying records, where there is a record for each Census tract that qualifies via the brownfields or coal criteria, and a record for each county that qualifies via the employment criteria. This input dataframe is the result of energy_comms.coordinate.get_all_qualifying_areas(). The output of this aggregation is a dataframe with a record for each county that has some qualifying area within it. Included are the following columns:

Brownfields Aggregation

The input dataframe is grouped by county FIPS code, and the number of brownfield records within that county are summed to get the total number of brownfields in the county.

Coal Aggregation

The input dataframe is grouped by county FIPS code, and the number of coal qualifying Census tracts within that county are summed to get the total number of coal qualifying tracts in the county.

To get the percentage of area that qualifies, first the area for each qualifying Census tract is calculated using geopandas and the Shapefile coordinates for each tract given by the Census DP1 geodatabase. Then the total area for each county is calculated. Finally, the input dataframe is grouped by county FIPS code, and the area of qualifying tracts within each county is summed, and then divided by the total area of that county to get the percentage of area that qualifies within each county.

Employment Criteria Aggregation

The input dataframe already includes a record for each county that qualifies via the employment criteria. These employment qualifying counties are merged onto the output dataframe to create a boolean column identifying whether a county qualifies via the employment criteria.

The output dataframe contains a record for each county with an energy communities qualifying area within it (or the entirety of the county qualifies) and the above columns.