ccodwg / CovidTimelineCanada

A definitive dataset for COVID-19 in Canada
https://opencovid.ca/
Other
27 stars 11 forks source link

Alberta individual-level case dataset has been retired #108

Closed jeanpaulrsoucy closed 11 months ago

jeanpaulrsoucy commented 1 year ago

Both the case and mortality datasets for Alberta currently rely on this dataset, so will need to be replaced.

jeanpaulrsoucy commented 1 year ago

For AB deaths, I considered the solution of using the current AB province-level death time series to calculate the number of "Unknown" health region deaths throughout the entire current AB HR-level death time series (and exclusively after the end of the HR-level data, 2023-06-05), but doing so results in negative "Unknown" health region deaths at many points during the time series. Instead, I will simply use the province-level time series for deaths after 2023-06-05 by subtracting the sum of non-unknown health region deaths from 2023-06-05 from the province-level time series after 2023-06-05. There may be a slight artificial death bump on 2023-06-06 due to incompatibilities between the two time series.

jeanpaulrsoucy commented 1 year ago

Looks like AB is still reporting deaths by health region here:

https://www.alberta.ca/covid-19-alberta-data.aspx

This table seems to be slightly better quality than the individual case dataset, which had several deaths with missing health regions. Theoretically, the dataset could be reconstructed from the archived tables with as-of dates added using the HR-level case dataset.

For now, we should just implement an update workflow for the AB active_cumul and worry about retroactive changes later.

jeanpaulrsoucy commented 1 year ago

If I want to re-do the history of the HR death time series with the "case breakdown" page (d3b170a7-bb86-4bb0-b362-2adc5e6438c2) (to get rid of the pesky unknown HR deaths that drop out when the new dataset is substituted on 2023-06-12), I will have to be careful in how the "case breakdown" table is processed, since it has changed in format from the past (see note in ccodwg/Covid19CanadaDataProcess@2174def1ac98fe6a66a31d6a795cafe759ebda30).

jeanpaulrsoucy commented 1 year ago

For some reason, the new pipeline is failing to download the case breakdown page with the following error:

<subscriptOutOfBoundsError in ds[[1]]: subscript out of bounds>

It works just fine locally. It seems unlikely that GitHub is IP-blocked because the statistics app, also hosted on alberta.ca, works just fine. One thing to try would be removing the JS requirement for the case breakdown page, which seems to be no longer required.

jeanpaulrsoucy commented 1 year ago

The above change seems to have fixed things, or at the very least the error did not re-occur. Not sure if there is something wrong with the webdriver setup in the Docker image or if this was something specific to this dataset.

jeanpaulrsoucy commented 11 months ago

If updating the deaths time series with the archived case breakdown page, could extract as-of dates for everything but then use the actual date (or date shift) for the daily updates and use the as-of date for after the updates shift to weekly. Would also have to use an extra day of the CCODWG dataset at the beginning of the time series.

jeanpaulrsoucy commented 11 months ago

The Alberta dashboard will be updated on Thursdays (https://www.alberta.ca/release.cfm?xID=8903776ADC74C-D039-20E1-88914186F8844F6D):

“Moving forward, the dashboard will be updated every Thursday throughout the respiratory virus season. This season occurs annually, beginning around the end of August.

“For the purposes of data tracking, the 2023-2024 season and data tracking began on Aug. 28 and will continue throughout the fall and into the new year. The data on the page is up to date as of Sept. 23 and will be updated on Oct. 5 with data from Sept. 30.”