ccodwg / CovidTimelineCanada

A definitive dataset for COVID-19 in Canada
https://opencovid.ca/
Other
26 stars 8 forks source link

Update BC datasets (old BCCDC dashboard has been retired) #106

Closed jeanpaulrsoucy closed 1 year ago

jeanpaulrsoucy commented 1 year ago

The B.C. COVID-19 Dashboard was retired on April 20, 2023.

Data sources for BC will need to be switched over.

jeanpaulrsoucy commented 1 year ago

It seems the new BCCDC dashboard is nearly impossible to scrape, at least using conventional methods.

The changes in how BCCDC handles data will need to be noted, as well. Here is how case data definitions have changed over time:

January 2020 to March 31, 2022: Total COVID-19 cases include lab-confirmed, lab-probable and epi-linked cases. Cases included those reported by the health authorities for the first time and any individual with a first positive lab-confirmed COVID-19 test reported in the Provincial Laboratory Information Solution (PLIS) or Sunquest.

April 1, 2022 to April 22, 2023: Any individual with a first positive lab-confirmed COVID-19 test reported in PLIS or Sunquest. Subsequent positive lab-confirmed COVID-19 tests were not included.

April 23, 2023 to present: Positive lab-confirmed COVID-19 test(s) belonging to the same individual are grouped together and considered part of the same infection episode if they are within 30 days. Positive lab-confirmed COVID-19 tests that are 30 or more days apart (regardless of negative tests in between) are considered a separate infection episode, and therefore an individual may have more than one infection episode of COVID-19.

Also this, regarding the location of cases:

From January 2020 to March 31, 2022, cases were reported by the health authority of residence or the reporting health authority; cases whose primary residence was outside of Canada were reported as “Out of Canada”. As of April 1, 2022, the health authority is based on the case’s address provided during laboratory testing and cases from outside of BC are not included. Please note that the health authority of residence and the health authority reporting the case do not necessarily indicate the location of exposure or transmission.

jeanpaulrsoucy commented 1 year ago

For now, it seems that using the weekly totals from the tables may be the best path forward for now. Using the cumulative totals and manipulating the date sliders is also a potential fix, but it's unclear if these two sources of data will be different (and if so, by how much), since the cumulative table has a footnote saying "Among those with available age information only."

Another caveat that would make using time series (or at least cumulative) data better:

Data in the most recent epi-weeks may not be complete (i.e. under-estimated) due to the timing of data systems and processes. Data are updated and become more complete over time.

Update: The cumulative data table ("Historical totals") doesn't seem to be an option for reconstructing the time series as trying to view anything but the most recent data results in a dashboard error:

Screenshot 2023-05-23 at 23-58-48 COVID-19 Situation Report

jeanpaulrsoucy commented 1 year ago

Closing for now as weekly data have been added for cases and deaths. Although these data are poor quality, owing the issues mentioned above, they are our only real options at the moment. The 7-day average data can be extracted from the dashboard, but there is no way to perfectly recover the underlying weekly data, given historical data changes. It should perhaps be added as an alternative dataset, however.