Fix def get_num_facilities in cell 17 in AllPrograms

ericnost commented 4 years ago

Currently, to calculate the number of facilities, we do num_fac = len(df_pgm_year.index.unique())

This works, I think, for most data tables.

However, it turns out that not every facility in NPDES_QNCR_HISTORY actually has a violation to its name. Some rows look like this: YEARQTR HLRNC NUME90Q NUMCVDT NUMSVCD NUMPSCH NPDES_ID
IA234234 20191 C 0 0 0 0

So, we shouldn't really include these in a calculation of violating facilities.

These can be removed by something like water_violators = water_violations.loc[water_violations["Sum"]>0] where the Sum field is just water_violations["NUME90Q"] + water_violations["NUMCVDT"] + water_violations["NUMSVCD"] + water_violations["NUMPSCH"]

ericnost commented 4 years ago

For instance, for OR-2 there is the following rows:

NPDES_ID | Date | HLRNC | Count | NUMCVDT | NUMSVCD | NUMPSCH ORR10E933 | 2019 | U | 0 | 0 | 0 | 0 ORR10E933 | 2019 | U | 0 | 0 | 0 | 0 ORR10E933 | 2019 | U | 0 | 0 | 0 | 0

It turns out ORR10E933 never actually had any violations even though it is in this non-compliance table. We just need to remove rows that sum to 0 and then calculate the number of unique ids.

shansen5 commented 2 years ago

The AllPrograms notebook has been retired. The functionality is now in the ECHO-ReportCard-Data python routines.

edgi-govdata-archiving / ECHO-Cross-Program

Fix def get_num_facilities in cell 17 in AllPrograms #93