edgi-govdata-archiving / ECHO-Cross-Program

Jupyter Notebooks for ECHO that use data from multiple EPA programs
https://colab.research.google.com/github/edgi-govdata-archiving/ECHO-Cross-Program/blob/master/ECHO-Cross-Programs.ipynb
GNU General Public License v3.0
8 stars 5 forks source link

Combined Air Emissions and Greenhouse Gases are not working again #57

Closed shansen5 closed 4 years ago

shansen5 commented 4 years ago

I suspect this is an issue with the float/string type of REGISTRY_IDs again.

ericnost commented 4 years ago

Shoot. I've had some luck with this in the get_data function:

if (index_field == "REGISTRY_ID"):
        ds = pd.read_csv(data_location,encoding='iso-8859-1', dtype={"REGISTRY_ID": "Int64"})

I suppose we could just remove the if and always treat REGISTRY_ID as an integer.

Actually, I think we could treat it as a string or float, so long as it's consistently treated that way. The problem right now I think is that when we load the Air Emissions table, Registry ID is one type, and when we load ECHO_EXPORTER it's another type, and so we can't join records across the two tables.