globaldothealth / monkeypox

Mpox 2022 repository
Other
175 stars 37 forks source link

Automate WHO and CDC data reconciliation #153

Closed jim-sheldon closed 2 years ago

jim-sheldon commented 2 years ago

Every business day the CDC releases monkeypox data we manually reconcile our line list against. We should automate it.

The process looks something like: 1.) Download the most recent CDC csv file 2.) Run a Jupyter notebook to compare today's and yesterday's CDC counts 3.) Identify regions where cases went down. For each region, for each case that went down, change the status of most recent entries for that state (or D.C) to omit_error, and update the last_modified entry. 4.) Find the total number of new cases (excluding Puerto Rico and Non-US Resident) and add to the line-list (using the CDC site as a source).

We should also post a message to slack about changes made.

jim-sheldon commented 2 years ago

We should also automate the WHO's Global Trends report [Table 2.5. Cases and deaths by country] to cross-check against G.h numbers and post an update to Slack if numbers are different for curators to investigate. We don't need to see if G.h cases are more than the WHO numbers. We only need to see where we are less.

We should do the WHO comparison SOD and the CDC comparison EOD (EST).

jim-sheldon commented 2 years ago

The G.h counts that we use for comparison are the aggregate of (confirmed+death) in the "Cases by Country" pivot table.

jim-sheldon commented 2 years ago

We need better rules around omitting cases before making edits to the spreadsheet, so it does not make any edits yet.