edgi-govdata-archiving / ECHO-Cross-Program

Jupyter Notebooks for ECHO that use data from multiple EPA programs
https://colab.research.google.com/github/edgi-govdata-archiving/ECHO-Cross-Program/blob/master/ECHO-Cross-Programs.ipynb
GNU General Public License v3.0
8 stars 5 forks source link

While fixing county names from ECHO_EXPORTER, also look for the word PARISH #65

Open shansen5 opened 3 years ago

shansen5 commented 3 years ago

The ECHO_EXPORTER has different names for the same county in many cases. It shows both WHATCOM and WHATCOM COUNTY, for example. In Louisiana it will show both ORLEANS and ORLEANS PARISH. There is code to catch the COUNTY and reflect that it is the same name, but we should also catch PARISH.

Frijol commented 2 years ago

@shansen5 is this specific to any one of the Notebooks in this repo?

ctsiagkalis commented 2 years ago

I could work on this, but I think there are more problems like this one. Maybe the data in the csv should be cleaned instead of doing this programmatically. Some of the names that probably need fixing are:

AK,ALEUTIANS EAST AK,ALEUTIANS EAST (B) AK,ALEUTIANS EAST BOROUGH

AK,ALEUTIANS WEST AK,ALEUTIANS WEST (CA) AK,ALEUTIANS WEST CENSUS AREA

PA,CRAWFORD PA,CRAWFORD CO. PA

PR,ADJUNTAS PR,ADJUNTAS MUNICIPIO

OK,STEPHENS OK,STEPHENSSTEPHENS

VA,EMPORIA VA,EMPORIA (CITY) VA,EMPORIA CITY

VA,PRINCE GEORGE VA,PRINCE GEORGE'S

VI,SAINT CROIX VI,SAINT CROIX ISLAND

If you still think we only need to catch PARISH, let me know and I will create a PR.

shansen5 commented 2 years ago

As this is data we download from the EPA and put into our database we can't really fix it in the CSV.
There is a fix_county_names() in ECHO_modules/utilities.py, which is only for displaying the name once in our dropdown list when there is both a SOMECOUNTY and SOMECOUNTY COUNTY in ECHO_EXPORTER.
Open to ideas of how we can do this more generally.