Closed ericnost closed 2 years ago
For instance, for OR-2 there is the following rows:
NPDES_ID | Date | HLRNC | Count | NUMCVDT | NUMSVCD | NUMPSCH ORR10E933 | 2019 | U | 0 | 0 | 0 | 0 ORR10E933 | 2019 | U | 0 | 0 | 0 | 0 ORR10E933 | 2019 | U | 0 | 0 | 0 | 0
It turns out ORR10E933 never actually had any violations even though it is in this non-compliance table. We just need to remove rows that sum to 0 and then calculate the number of unique ids.
The AllPrograms notebook has been retired. The functionality is now in the ECHO-ReportCard-Data python routines.
Currently, to calculate the number of facilities, we do
num_fac = len(df_pgm_year.index.unique())
This works, I think, for most data tables.
However, it turns out that not every facility in NPDES_QNCR_HISTORY actually has a violation to its name. Some rows look like this: YEARQTR HLRNC NUME90Q NUMCVDT NUMSVCD NUMPSCH NPDES_ID
IA234234 20191 C 0 0 0 0
So, we shouldn't really include these in a calculation of violating facilities.
These can be removed by something like
water_violators = water_violations.loc[water_violations["Sum"]>0]
where the Sum field is justwater_violations["NUME90Q"] + water_violations["NUMCVDT"] + water_violations["NUMSVCD"] + water_violations["NUMPSCH"]