edgi-govdata-archiving / ECHO-Cross-Program

Jupyter Notebooks for ECHO that use data from multiple EPA programs
https://colab.research.google.com/github/edgi-govdata-archiving/ECHO-Cross-Program/blob/master/ECHO-Cross-Programs.ipynb
GNU General Public License v3.0
8 stars 5 forks source link

The map in the database_views branch shows multiple markers for the same facility #66

Closed shansen5 closed 3 years ago

shansen5 commented 4 years ago

Show the number of rows corresponding to the facility, but only one marker.

ericnost commented 4 years ago

df_to_map = program_data.loc[~program_data.index.duplicated(keep='first')]

This line is meant to take the program data - which can include multiple copies of the same facility if it has multiple violations/inspections/enforcements - and de-duplicate it.

It doesn’t always work (e.g. CAA enforcements) because of how the dataframe is indexed. If its indexed on NPDES ID, Registry ID, ID NUMBER, etc. then it’ll work to map a facility only once. But if it’s not indexed, the duplicates won’t be removed.