Open Frijol opened 4 years ago
The industry-specific data: sic and NAICS does are often missing and sometimes wrong. so looking at in enforcement in a single industry is very very labor intensive.
We've found upwards of 12 FRS id numbers for a single prison. they aren't cross referenced, we just had to find them all and compile them --is that what you mean by disaggregations?
once we get the prison data retrieval from the SBU database working we'll be able to have validated lat/long next to EPA lat/long, so we could quantify the inaccuracies.
"1" is a valid zip code according to the database
Would be interesting to create a notebook just for identifying data errors
Adding this, from Leif:
"However, while increased scrutiny has been given to the quality of data in the national systems in recent years, the data quality before November 2000 has not been assessed and should be considered unknown." https://echo.epa.gov/tools/data-downloads <- We should really consider modifying the notebooks to start in 2001 ->
"State/Local penalties were not tracked until Docket was replaced by ICIS in 2002." https://echo.epa.gov/tools/data-downloads/icis-fec-download-summary
This is an important resource: https://echo.epa.gov/resources/echo-data/about-the-data
From the above ^^^ "Enforcement actions that have not been concluded are not displayed in ECHO"
See also: https://echo.epa.gov/resources/echo-data/known-data-problems Perhaps we are trying to add to this/make it more accessible.
Relevant to the CD LA-2 report: "In some circumstances, Clean Air Act enforcement actions taken by the state of Louisiana have been reported to EPA in duplicate (meaning that one enforcement action and penalty will appear twice on ECHO reports). Louisiana is working to correct this information. (Updated February 2018)" @calderEDJ
More on geocoding errors....
so I ran the facility-all-programs notebook that <@U4C7QT800> linked above for LA D2 and got some really weird results in the map: including one facility in texas and another in kentucky? View in Slack
There are 52 EPA facilities with FAC_COUNTY listed as METROPOLITAN BOSTON and tracked in the ECHO database, but there is no METROPOLITAN BOSTON county in reality.
This one made me LOL
cc @mraisle @leiffredrickson that this collection of data errors exists. There's also this document Principles for Open Environmental Data and I think some things are tracked in the EEW Tracker
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.
Still want to do this I think!
Would this be a place to list discrepancies between SBU Data and ECHO web interface or would that be somewhere else?
@shapironick I think here would be the best place - not sure there's anywhere else!
okie great! this is just the one i mentioned in meeting last week: FRS ID 110013272680. The formal action that is listed on the Detailed Facility Report (https://echo.epa.gov/detailed-facility-report?fid=110013272680) isn't listed in the enforcement actions that we were able to query via SBU database. I'm not sure why it's not listed.
@shapironick Hmm...interesting that that facility has no address too...
Ok, so I think I know what the problem is. Here we go:
First, I can confirm that when I do my own query of the SBU database, with a different notebook, I don't see this formal action either.
Note that the formal action was led by US EPA and is being pulled from the ICIS data system, not the SDWIS data system. So that's why it's not showing up in the notebook/database. Because it's not in the SDWA_Enforcements table in the SBU database, which is based on SDWIS.
Instead, I suspect the formal enforcement action record resides in the ICIS FEC tables: https://echo.epa.gov/tools/data-downloads/icis-fec-download-summary We have these in the database, but haven't done much with them.
I will check this table to confirm.
Another great example of the fragmented data infrastructure we have to work with.
Part of the challenge with working with the ICIS FEC tables - which are kind of like, a bunch of random other enforcement actions not described elsewhere - is that they are not organized by facility. There is no way to look at the CASE_ENFORCEMENTS table by facility. You have to query another table to get the enforcement action IDs associated with the facility, then ping the actual CASE_ENFORCEMENTS table for the detailed records.
Indeed, I found the formal action in the CASE_ENFORCEMENTS table:
ACTIVITY_ID | ACTIVITY_NAME | STATE_CODE | REGION_CODE | FISCAL_YEAR | CASE_NUMBER | CASE_NAME | ACTIVITY_TYPE_CODE | ACTIVITY_TYPE_DESC | ACTIVITY_STATUS_CODE | ACTIVITY_STATUS_DESC | ACTIVITY_STATUS_DATE | LEAD | CASE_STATUS_DATE | DOJ_DOCKET_NMBR | ENF_OUTCOME_CODE | ENF_OUTCOME_DESC | TOTAL_PENALTY_ASSESSED_AMT | TOTAL_COST_RECOVERY_AMT | TOTAL_COMP_ACTION_AMT | HQ_DIVISION | BRANCH | VOLUNTARY_SELF_DISCLOSURE_FLAG | MULTIMEDIA_FLAG | ENF_SUMMARY_TEXT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3600738616 | Texas Department of Criminal Justice | 6 | 2016 | 06-2016-1212 | Texas Department of Criminal Justice | AFR | Administrative - Formal | CLS | Closed | 10/25/17 | EPA | 10/25/17 | ECN | Final Order No Penalty | 6EN-W | N | On May 26, 2016, EPA Region 6 issued an Administrative Order(AO) for Plan Implementation (AO Plan Implementation) to the Texas Deparment of Criminal Justice (Respondent). Respondent has requested additional time to complete the construction schedule. The AO incorporates the new construction schedule deadline extension request submitted by the Respondent in their compliance plan. On July 19, 2011, EPA Region 6 issued the original AO under its authority pursuant to the Safe Drinking Water Act to Respondent for violations at its water system in Harlingen, TX. The Original AO was issued in response to violations of the arsenic maximum contaminant level (MCL). The Original AO required Respondent to comply immediately with the referenced MCL or to submit to EPA 1) a detailed plan to meet the arsenic MCL; 2) a cost analysis of the proposed plan; and 3) a construction schedule. This AO extends the compliance deadline to June 13, 2017 and supersedes the AO issued on July 19, 2011. |
Seems like we need to figure out ways of querying this table. I think it would involve pinging the CASE_FACILITIES table for the REGISTRY_ID field (which links to the main ECHO_EXPORTER table), and then grabbing the ACTIVITY_ID (s) or CASE_NUMBER (s)
I tested this approach:
sql = 'select * from "CASE_FACILITIES" where "REGISTRY_ID" = \'110013272680\''
This returns TWO actions, the one from 2016 and also one from 2011 (remember that ECHO.EPA.GOV only displays enforcement actions from the previous 5 years....our database has the full archive)
To get the full record then we grab the ACTIVITY_ID (s)
sql = 'select * from "CASE_ENFORCEMENTS" where "ACTIVITY_ID" = \'2600045351\'' #the one from 2011
ACTIVITY_ID | ACTIVITY_NAME | STATE_CODE | REGION_CODE | FISCAL_YEAR | CASE_NUMBER | CASE_NAME | ACTIVITY_TYPE_CODE | ACTIVITY_TYPE_DESC | ACTIVITY_STATUS_CODE | ACTIVITY_STATUS_DESC | ACTIVITY_STATUS_DATE | LEAD | CASE_STATUS_DATE | DOJ_DOCKET_NMBR | ENF_OUTCOME_CODE | ENF_OUTCOME_DESC | TOTAL_PENALTY_ASSESSED_AMT | TOTAL_COST_RECOVERY_AMT | TOTAL_COMP_ACTION_AMT | HQ_DIVISION | BRANCH | VOLUNTARY_SELF_DISCLOSURE_FLAG | MULTIMEDIA_FLAG | ENF_SUMMARY_TEXT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2600045351 | TDCJW Pack Unit | 6 | 2011 | 06-2011-1355 | TDCJW Pack Unit | AFR | Administrative - Formal | CSU | Closed Superseded | 5/26/16 | EPA | 5/26/16 | ESU | Superseded by Another Enforcement Action | 6EN-W | N | Administrative order issued for non compliance with the requirements of the Act and implementing regualtions including the maximum Contaminant Levels (MCLs) for arsenic. |
@shapironick - I've lost the link to the notebook Steve wrote for you, but if you share it with me, I could try adding in a section that looks for these other enforcement actions.
Amazing!! thank you, Eric! This is so helpful.
Here is shane's notebook https://colab.research.google.com/github/shansen5/CA_Prisons/blob/master/CA_Prisons2.ipynb#scrollTo=MhTd2hLBpUJT
do you think that the ICIS FEC table is mutually exclusive with other violations? or would there be the possibility of duplication?
@shapironick Yes, I think there could be duplicates. According to the metadata: "ICIS may contain duplicates of actions and penalties shown in the program databases (i.e., ICIS-Air, ICIS-NPDES, and RCRAInfo)." https://echo.epa.gov/tools/data-downloads/icis-fec-download-summary
Ok, I've made some edits to @shansen5's notebook - the principles are there and it'll work for a small number of facilities at this point, but hopefully Steve can adjust for looking at a greater number
Thank you so much, Eric!
On Wed, Feb 3, 2021 at 2:44 PM Eric Nost notifications@github.com wrote:
Ok, I've made some edits to @shansen5 https://github.com/shansen5's notebook - the principles are there and it'll work for a small number of facilities at this point, but hopefully Steve can adjust for looking at a greater number
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edgi-govdata-archiving/EEW_Planning/issues/96#issuecomment-772877288, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZH33ZRB6NAX4TWD4LVSB3S5HGQVANCNFSM4NFHFRIA .
-- Nicholas Shapiro Assistant Professor UCLA Institute for Society and Genetics Office: (310) 206-2366 Book my office hours here https://calendly.com/n-s/office-hours-monday-1-30-3-30?
One thing I have tried to do is to look at the percent of facilities that have no available information (blank, null or NaN values in our database). I tested this on the summary Clean Water Act information available in the main ECHO Exporter table. I compared facilities in majority white neighbourhoods with those in majority minority neighborhoods: https://colab.research.google.com/drive/1Ej7d2ovZZx-mQz3wuPzrKWlPa5ZBY-l7 This shows that, across the US, the vast majority of CWA-regulated facilities have no recorded information about # of inspections, violations, enforcement actions, etc. It could be that's because they have no violations / enforcement actions. But reporting null is different from reporting a verified 0! This also shows that for the CWA, facilities located in majority minority neighbourhoods tend to have less certain information. This is especially true for inspections: Percent of facilities with no info on CWA_INSPECTION_COUNT 87.68 (minority) 80.79 (white) Percent of facilities with no info on CWA_DAYS_LAST_INSPECTION 77.87 (minority) 68.86 (white) (edited)
Incredible work, Eric!
Watershed errors:
The Niagara River HUC8 watershed is listed as 04270101 see here: https://www.sciencebase.gov/catalog/item/5d6f6573e4b0c4f70cf90848 It is also the 427 one here https://www.arcgis.com/home/webmap/viewer.html?useExisting=1&layers=4c08f2e2b13741da96ad4a8f6aa5e36a
This gets rendered as 4270101 in our database. But looking at the database, the only HUCs listed for facilities in Buffalo are 4120103, 4120104, nan, 4140101, and 2010008. For Tonawanda, it's just 4120104 and nan
The Niagara River HUC8 is listed as 04120104 here https://water.usgs.gov/wsc/cat/04120104.html
So, unfortunately, it appears there are at least two HUC8 classification schemes in circulation. Seems worth digging into....I think the 427 is more recent and EPA has failed to update yet.
Linking in @ericnost 's ideas for visualizing missing reports https://edgi.slack.com/archives/CKN6FMX42/p1615479261188300
We've mentioned cataloguing data errors before but I don't think we have a place to list them. For now, collecting here:
┆Issue is synchronized with this Trello card