edgi-govdata-archiving / ECHO-Cross-Program

Jupyter Notebooks for ECHO that use data from multiple EPA programs
https://colab.research.google.com/github/edgi-govdata-archiving/ECHO-Cross-Program/blob/master/ECHO-Cross-Programs.ipynb
GNU General Public License v3.0
9 stars 5 forks source link

Air inspections data does not load #31

Closed ericnost closed 4 years ago

ericnost commented 4 years ago

In the cell that begins program = data_sets[ data_set_widget.value ] ...., I am unable to load Air inspections data (for NY-1 at least). Surely there must be inspections....

ericnost commented 4 years ago

To clarify, the cell block resolves, but then in the "Chart This" block below, I receive the following message: "There's no data to chart for Air Inspections !" Again, there must be inspections!

ericnost commented 4 years ago

The CAA inspections notebook reports 3 violations in NY-1.

ericnost commented 4 years ago

In addition, I cannot load Air Violations or Air Formal Actions (for NY-1). For both of those datasets, I receive the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-21-9e09092fd9df> in <module>()
     18     program_data = program.get_data( ee_ids=registry_ids )
     19 else:
---> 20     program_data = program.get_data( ee_ids=ids )
     21 
     22 program_data

3 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
   4301 
   4302         if missing:
-> 4303             raise KeyError(f"None of {missing} are in the columns")
   4304 
   4305         if inplace:

KeyError: "None of ['PGM_SYS_ID'] are in the columns"
shansen5 commented 4 years ago

Working on this in the 'bug-fixes' branch. icis-air-inspections.ipynb finds these 3 facilities with 3 inspections: 110019538231 | BROOKHAVEN LANDFILL & RECYCLING AREA 110000825769 | NORTHVILLE HOLTSVILLE TERMINAL 110001585286 | IRS BROOKHAVEN SERVICE CENTER These 3 are in the select where in queries in facility-all-programs, but don't return any records. select * from ICIS_FEC_EPA_INSPECTIONS where REGISTRY_ID in ( '110001585286', '110000825769', '110019538231') http://apps.tlt.stonybrook.edu/echoepa/?query=select%20%2A%20from%20%60ICIS_FEC_EPA_INSPECTIONS%60%20where%20REGISTRY_ID%20in%20%28%20%27110001585286%27%2C%20%27110000825769%27%2C%20%27110019538231%27%29

In addition, I cannot load Air Violations or Air Formal Actions (for NY-1). For both of those datasets, I receive the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-21-9e09092fd9df> in <module>()
     18     program_data = program.get_data( ee_ids=registry_ids )
     19 else:
---> 20     program_data = program.get_data( ee_ids=ids )
     21 
     22 program_data

3 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
   4301 
   4302         if missing:
-> 4303             raise KeyError(f"None of {missing} are in the columns")
   4304 
   4305         if inplace:

KeyError: "None of ['PGM_SYS_ID'] are in the columns"

This is fixed in an update to ECHO-Modules, DataSet.py.