ccodwg / CovidTimelineCanada

A definitive dataset for COVID-19 in Canada
https://opencovid.ca/
Other
26 stars 8 forks source link

Automated collection of report data #95

Closed jeanpaulrsoucy closed 5 months ago

jeanpaulrsoucy commented 1 year ago

Report-style data extraction currently occurs with the following scripts. Those that are crossed out have been sunsetted as they will not contain any new/updated data for dates before 2023-12-31, when this dataset ends:

jeanpaulrsoucy commented 1 year ago

The current version of the CRISP report completely breaks the existing script. Part of this is because the format of the report has changed yet again. Tabulizer struggles mightily with the formatting of some of the tables (and this changes for similar tables from report-to-report). It may not be worth it to continue development of the script, but at least it creates the template for each week of data, which was tedious before.

jeanpaulrsoucy commented 1 year ago

In this week's report, the NB report script struggled to capture some of the province-level metrics (deaths, recovered, active). The latter two I might be able to pull from the of the better-formatted tables instead, reducing the manual verification required.

jeanpaulrsoucy commented 1 year ago

For the NS report, even if we want to manually verify or enter the dates (due to past inconsistencies), we could easily implement the actual data pulls in a script.

jeanpaulrsoucy commented 1 year ago

Looks like the python library camelot might be promising for more reliable PDF extraction (see comparisons with other libraries).

jeanpaulrsoucy commented 1 year ago

Note that I manually corrected what seems to be an error in NS's dashboard today.

Last week: deaths since July 2022 - 293 / cumulative deaths - 778 This week: 10 new deaths / deaths since July 2022 - 303 / cumulative deaths - 778

The cumulative deaths should be 788, so I manually changed the value for this.

I've alerted NS public health to this probable error. We'll see if they fix it before next week's update, or if some correction for this will have to be made going forward.

UPDATE: Looks like they fixed it.

jeanpaulrsoucy commented 1 year ago

Note that after today, the next Manitoba report will be released on Thursday, April 6, 2023. It's not clear whether this will contain the same data had the report been released on the expected schedule (Friday, March 31, 2023).

jeanpaulrsoucy commented 1 year ago

The schedule for NB manual report data is changing:

Effective May 2nd, we will be moving to a monthly reporting cycle for COVIDWATCH, before resuming regular reporting in Fall 2023. Therefore, for the remainder of this respiratory illness season, reports will be released May 2nd, May 30th, June 27th, July 25th, and August 29th.

For the next respiratory illness season, which will begin in early September, we will release bi-weekly reports on September 12th and September 26th and then move to weekly reporting afterwards.

jeanpaulrsoucy commented 1 year ago

The schedule for SK report data is changing:

Note: CRISP reports will be transitioning to monthly during the summer as the prevalence of respiratory viruses in the warmer month’s declines. We will return to regular bi-weekly CRISP reporting in the fall.

jeanpaulrsoucy commented 1 year ago

The NS dashboard has a pretty clear mistake on the dates today, so I've recorded the dates as April 25 to May 1 and May 2, respectively.

ns-dates

jeanpaulrsoucy commented 1 year ago

As sometimes happen, the usual NS hospitalization as of date is not reporting period end date + 1, as it normally is.

Screenshot from 2023-05-12 00-08-50

As is standard procedure, I will record the hospitalization date as is.

jeanpaulrsoucy commented 1 year ago

As sometimes happen, the usual NS hospitalization as of date is not reporting period end date + 1, as it normally is.

Screenshot from 2023-05-12 00-08-50

As is standard procedure, I will record the hospitalization date as is.

Ditto with this week: reporting period May 9 - May 15, hospitalization data as of May 15, as well.

jeanpaulrsoucy commented 1 year ago

Three times is a pattern, I guess:

ns-3

jeanpaulrsoucy commented 1 year ago

For the most recent BC report (2023-06-01), note that the health region totals do not always add up to the BC total given for every value (unlike last week). For example, hospital admissions:

May. 21 - May. 27: 40 / 9 / 6 / 27 / 20 / 103. The HR totals (40 + 9 + 6 + 27 + 20) add up to 102, rather than 103.

In this week's data, ICU admissions also have an off-by-one error, but cases and deaths seem unaffected.

jeanpaulrsoucy commented 11 months ago

No MB report this week:

NOTE: Due to technical difficulties, COVID-19 and seasonal influenza data for the week of July 2 to July 8, 2023 is unavailable.

jeanpaulrsoucy commented 11 months ago

No MB report this week, either:

NOTE: Due to technical difficulties, COVID-19 and seasonal influenza data for week 27 (July 2 - July 8, 2023) and week 28 (July 9 - July 15, 2023) is unavailable.

Depending on the data provided in the next report, may have to go to the PHAC dataset to fill in some of the gaps using "Unknown" health region for at least one week.

jeanpaulrsoucy commented 11 months ago

The MB report has finally been updated after three weeks, but with the added complication of transitioning to a new "season" for cumulative numbers, as well as not providing their standard tables.

NOTE: We have transitioned into a new season (July 2023-June 2024) for reporting of COVID-19 and influenza. Due to small numbers at the start of this current season, a number of figures and graphs are unavailable.

Will have to take some time to reconcile the numbers. So far, PHAC only has numbers up to the week ending July 1.

jeanpaulrsoucy commented 11 months ago

Manitoba has, so far, not updated with their usual Friday data release (expected August 4).

jeanpaulrsoucy commented 10 months ago

Second Friday with no Manitoba report.

jeanpaulrsoucy commented 10 months ago

Looks like historical reports have been silently added for the missing weeks of data, which can be found via URL manipulation. However, the main page still points toward the week 29 report, so these new reports have not been added to the archive.

https://www.gov.mb.ca/health/publichealth/surveillance/influenza/2022_2023/week_30/index.html https://www.gov.mb.ca/health/publichealth/surveillance/influenza/2022_2023/week_31/index.html

jeanpaulrsoucy commented 9 months ago

Will need to update the columns in data_sources.csv for the new nb_weekly_report_3 dataset once the regular update schedule becomes clear.

jeanpaulrsoucy commented 8 months ago

Note: BC seems to be missing data for several weeks in the situation report dataset (2023-06-03, 2023-08-05). This seems to be caused by weeks where there are five new weeks of data, but the new report only gives the most recent four weeks. It helps that they now update the situation report twice a month instead of once a month now. I might be able to fill in the blank weeks using their "cumulative totals" date slider.

jeanpaulrsoucy commented 8 months ago

For BC: Should add "currently in hospital" data from the situation report to the report.

jeanpaulrsoucy commented 7 months ago

The underlying code of the BC dashboard seems to have updated. It may be required to "click" elements twice to get the updates to stick, but will test further next week.