ccodwg / CovidTimelineCanada

A definitive dataset for COVID-19 in Canada
https://opencovid.ca/
Other
27 stars 11 forks source link

Testing data #94

Closed jeanpaulrsoucy closed 11 months ago

jeanpaulrsoucy commented 1 year ago

With respect to testing data, the best solution here might be to add provincial testing sources where available and retire testing data where not available. Perhaps I can add the percent positivity dataset from RVDSS as a separate dataset, but of course it will only go back to August 2022.

jeanpaulrsoucy commented 1 year ago

Some historical testing data can probably be extracted from Newfoundland's daily health region dataset (34f45670-34ed-415c-86a6-e14d77fcf6db), but the testing value appears to be no longer updated. Probably the same thing that happened to deaths (#113) happened to testing values, but at least historical data may possibly be extracted.

jeanpaulrsoucy commented 1 year ago

Should add a warning that percent positivity values calculated from case and testing data may not necessarily line up, particularly if there is a mismatch between the reporting periods (e.g., daily testing data versus weekly case data). This applies primarily to testing data derived from the PHAC dataset. However, aggregating data to the level of week would probably be approximately correct.

jeanpaulrsoucy commented 12 months ago

The PHAC dataset is worth another look. The numbers reported there appear compatible with other data sources, possibly even reporting more tests than other sources. Will have to check if this is reliable and holds for all provinces though. At the very least, could use these numbers in the absence of another source.

jeanpaulrsoucy commented 12 months ago

Could use JavaScript to extract BC testing data as a report dataset from the dashboard: https://bccdc.shinyapps.io/respiratory_covid_sitrep/#Test_rates_and_percent_positivity

jeanpaulrsoucy commented 12 months ago

For ON, the a8b1be1a-561a-47f5-9456-c553ea5b2279 dataset might be useful up to 2023-04.

jeanpaulrsoucy commented 11 months ago

It looks like the HR-level testing time series for NL was retired at some point before the PHAC SALT time series ended:

HealthRegions_Covid_2023-09-27_22-19.json

Tests 331939

HealthRegions_Covid_2022-11-04_22-12.json

Tests 331939

jeanpaulrsoucy commented 11 months ago

Analysis of current testing dataset (using mainly the old PHAC testing dataset, SALT, as well as alternative datasets for a few provinces) versus the new PHAC testing dataset (RVDSS). SALT was intended to be comprehensive, whereas RVDSS is said to represent a subset of labs.

PTs using SALT (up until the period SALT ended):

PTs using alternative datasets:

In the figures, the current dataset is represented by a black line and the RVDSS dataset by a blue line.

Comparison between the current testing data and RVDSS during the period when SALT and RVDSS overlapped:

Rplot

We have ~perfect overlap between AB, BC, NS, NT, NU, SK. NB, NL, ON, QC have significantly lower numbers in the new dataset, indicating they are a subset.

Oddly, PE and YT have significantly higher testing numbers. YT does not use SALT (because of incompleteness of the time series), but SALT has a bit higher testing numbers for YT than the YT dashboard time series. However, the RVDSS numbers are much higher during the time of overlap.

Comparison between the current testing data and RVDSS for PTs reporting since the RVDSS dataset began:

Rplot02

Again, AB and BC have near-perfect overlap where ON and QC are subsets. MB looks very close, and smooths out some gaps in the reports, so it could probably be used.

Summary:

Code for figures:

# load packages
library(dplyr)
library(readr)
library(lubridate)
library(ggplot2)

# load data
t1 <- read_csv("data/pt/tests_completed_pt.csv") |> filter(region != "CAN")
t2 <- read_csv("raw_data/active_ts/can/can_tests_completed_pt_ts.csv") |> filter(region != "CAN")

# aggregate first dataset to weekly, ending Saturday
# set date to next Saturday, unless it is Saturday, then aggregate
t1 <- t1 |>
  mutate(date =  date + days((6 - wday(date, week_start = 1) %% 7))) |>
  group_by(region, date) |>
  summarize(value_daily = sum(value_daily), .groups = "drop")

# plot comparisons of overlapping period
ggplot(data = NULL, aes(x = date, y = value_daily)) +
  geom_line(data = t1 |>
    # filter to final Saturday (2022-11-19) - all PTs end SALT reporting on this date or later
    # except YT and NT, which end on 2022-11-05 and 2022-11-12, respectively
    filter(date <= "2022-11-19") |>
    filter(!(region == "YT" & date > "2022-11-05")) |>
    filter(!(region == "NT" & date > "2022-11-12")) |>
    # filter to min of second dataset
    filter(date >= min(t2$date)),
    colour = "black") +
  geom_line(data = t2 |>
    # filter to max of first dataset
    filter(date <= "2022-11-19"), colour = "blue") +
  facet_wrap(~region, ncol = 4, scales = "free") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(x = NULL, y = "Tests completed (weekly)")

# plot comparisons for provinces with current data
ggplot(data = NULL, aes(x = date, y = value_daily)) +
  geom_line(data = t1 |>
    # filter to specific regions
    filter(region %in% c("AB", "BC", "MB", "ON", "QC")) |>
    # filter to period after original PHAC data ended
    filter(date >= "2022-11-26"),
    colour = "black") +
  geom_line(data = t2 |>
    # filter to specific regions
    filter(region %in% c("AB", "BC", "MB", "ON", "QC")) |>
    # filter to period after original PHAC data ended
    filter(date >= "2022-11-26"), colour = "blue") +
  facet_wrap(~region, ncol = 3, scales = "free") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(x = NULL, y = "Tests completed (weekly)")
jeanpaulrsoucy commented 11 months ago

The PE discrepancy in the SALT data can probably be explained by only counting PCR tests (whereas the RVDSS may be more expansive). E.g. from the PEI case website for 2022-10-11:

Average number of tests per day (including both RT-PCR and Abbott ID Now rapid molecular tests) over the last 7 days: 163

Whereas the SALT values are in the 30s per day.

jeanpaulrsoucy commented 11 months ago
jeanpaulrsoucy commented 11 months ago

For NB, the PHAC SALT dataset and the NB report disagree on the cumulative number of tests on 2022-11-19:

SALT: 943715 NB report: 952545

The solution here is to simply add the weekly values reported along the cumulative values in the report.

jeanpaulrsoucy commented 11 months ago

NB testing data up to the end of the "COVID Watch" reports has been added. Since the "Respiratory Watch" reports began, testing numbers have not been reported directly, except on graphs. Only rough, rounded percent positivity numbers are available, from which test counts could be approximated, but this is beyond the scope of this project.

jeanpaulrsoucy commented 11 months ago

Can confirm the new PE webpage does not include test counts, only percent positivity. Thus the testing time series can be retired once I reconcile the PHAC time series and the old webpage.

EDIT: The RVDSS test counts seem consistent with the value "Average number of tests per day (including both RT-PCR and Abbott ID Now rapid molecular tests) over the last 7 days" * 7. Perhaps this time series can be used after it begins. The only problem is reconciling webpage data with the PHAC time series (which seems to be significantly ahead of PCR counts on the webpage). Maybe I could append the daily data once the non-PCR tests begin getting reported.

jeanpaulrsoucy commented 11 months ago

Comparing the numbers reported on the PEI webpage under "Average number of tests per day conducted at provincial COVID-19 testing clinics over the last 7 days (not including tests conducted at points of entry or hospitals)" with the RVDSS numbers for the overlapping time period:

PEI:

date_end tests
2022-09-06 1673
2022-09-13 1687
2022-09-20 1442
2022-10-04 1141
2022-10-11 1141
2022-10-18 1722
2022-10-25 1603
2022-11-01 1225
2022-11-08 1183

RVDSS:

date_end tests
2022-09-03 1898
2022-09-10 1430
2022-09-17 1426
2022-09-24 1422
2022-10-01 801
2022-10-08 1277
2022-10-15 1401
2022-10-22 1537
2022-10-29 1300
2022-11-05 1059

They are similar. However, it is not possible to align the weekly data sources, as they end on different days. Thus, the most practical method would be to switch over to the RVDSS when it becomes available, cutting of the PHAC daily dataset as appropriate (and with the appropriate data note).