ccodwg / CovidTimelineCanada

A definitive dataset for COVID-19 in Canada
https://opencovid.ca/
Other
27 stars 11 forks source link

Postprocessing of raw data #1

Closed jeanpaulrsoucy closed 1 year ago

jeanpaulrsoucy commented 3 years ago

How should the raw data be processed into the final harmonized dataset?

Aside from the obvious issue of different types of dates used in different datasets, there's also the question of impossible values. These could be smoothed out relatively easily. For example, the current Ontario PHU time series contains a massive one-drop in deaths, followed by a massive next-day increase. This is clearly some kind of error in the original dataset. We could smooth out all negative values by retroactively modifying all cumulative values so that the daily differences are never less than 0.

Note: Cumulative values for "Not Reported / Unknown" health regions certainly CAN be negative, however.