act-now-coalition / can-scrapers

MIT License
9 stars 13 forks source link

add duplicate row removal to _reshape_variables #354

Closed smcclure17 closed 3 years ago

smcclure17 commented 3 years ago

some instances of the CDCCovidDataTracker were failing on insertion into the database (put()) because of duplicated entries in the dataframe.

There shouldn't ever be duplicated rows in the scraper output data (as far as I can think of), so this adds a line to remove them. _reshape_variables() is a helper method used by many scrapers (including CDCCovidDataTracker) to format the data