Shift active_cumul time series for deaths to match cases

ccodwg / CovidTimelineCanada

A definitive dataset for COVID-19 in Canada

https://opencovid.ca/

Other

27 stars 11 forks source link

Shift active_cumul time series for deaths to match cases #49

Closed jeanpaulrsoucy closed 2 years ago

jeanpaulrsoucy commented 2 years ago

Using BC as an example, data are updated on Thursdays, with case data representing values up to the previous Saturday. However, death data are still ascribed to the current date using the existing system. For example, using today's update, the case time series goes up to May 14th but the death time series to May 19th, despite the weekly dashboard reading deaths "as of May 14, 2022", same as the cases.

The existing time series can probably be fixed using archived data of the case CSV. E.g., constructing the time series from each unique case CSV using the max date of cases as the date for the time series. The only question is how this will be maintained, as it doesn't really line up with the existing active_cumul method.

jeanpaulrsoucy commented 2 years ago

The above strategy should work equally well for AB and BC. It will, however, necessitate the creation of unique Google sheets and a new update workflow (i.e., a different "mode" for merge_sheets).

Edit: The BC case CSV doesn't actually indicate fatal cases, although the AB one does. So for BC, simply transforming the data from the point when they started reporting weekly should do the job (although if not done carefully, could miss out-of-sequence updates, such as if holidays delayed reporting). For BC, could use the cumulative values file but replace the dates with the latest date from the case CSV.

jeanpaulrsoucy commented 2 years ago

BC switched to weekly reporting on Thursday, April 7, 2022:

Data will be updated Thursday afternoons and will provide information from the past full week, from the previous Sunday to Saturday.

The first reports and updates will include data up to the week of March 27 to April 2, 2022.

Thus, the cumulative values reported on 2022-04-07 should be shifted back to 2022-04-02. Thus, the original CCODWG data should be used up to 2022-04-01 and the appropriately shifted active_cumul data after this point.

jeanpaulrsoucy commented 2 years ago

This problem is resolved for now, and there is now a framework for dealing with future issues (e.g., when hospitalization datasets are updated, see #31).