cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Canadian FluWatch Data #1944

Open dajmcdon opened 9 months ago

dajmcdon commented 9 months ago

I would like to add a few signals all from the same source. The data is in Tables published weekly by the Public Health Agency of Canada. That link is for the most recent report.

I'm currently working with a student to collect all the previous versions, but it would be great to serve all of this data along with new information as it is added. My goal would be to have this ready to go by the beginning of next year's season, which is usually near the end of August.

Data details

Additional context

Some other questions @nmdefries suggested I address:

Does the data have revisions? If there are revisions, how often, how far back, on which signals, etc

Yes, but only some of the signals. Revisions go back as far as the beginning of the current season.

What are the limitations of the data? e.g. lack of geo coverage, any censoring, based on a biased sample/not representative

Some of all of this. The most disaggregated data is for a specific (potentially biased) collection of labs. There's also internal processing with private data that I'm not aware of. Some geographies have greater coverage than others.

Any processing that the source does. e.g. normalization, smoothing, censoring

Not really.

Whether you foresee us needing to derive any signals or we can report as-is

I think "as is" mainly. It may be helpful to convert some percentages to raw counts, but this isn't entirely necessary (as the denominator is also present in the source).

What geo data we need and where to get it

Probably just a few crosswalks with Lab -> Province -> Region. I can help with this.

On my end:

The data is public, but not otherwise served. I should check any data use information to see if there are potential issues. I should also check if they're willing to give us a more useful format/source than scraping the website when it get's released (and being subject to unknown decisions to change the format).

dsweber2 commented 9 months ago

I was talking with Ron about this, and the partial revision behavior came up. Do you think we should start archiving now, before the flu season ends, for those signals which don't include revision behavior? Or is your student effectively covering that aspect already?

dajmcdon commented 9 months ago

I think it's actually easier than that, but maybe I'm not being very clear. I think there are 2 cases

  1. Some signals report weekly. And each weekly issue also contains revisions to all past time values for the season. But the previous reports all remain online and accessible. So we can scrape it all at once (now or later), then during the season, just grab the most recent issue which will contain multiple time values for each location.
  2. Some signals report weekly. But they don't include any previous time values. So for any issue, there is always 1 unique time value. Past reports remain online, but I'm guessing they are never updated. So there is 1 issue per time value and no way to track revisions at all.
nmdefries commented 4 months ago

Raw data is available in html tables at the link above. I can try to contact the agency to see if there is an easier and more timely way to access it.

@dajmcdon Did you hear back about any alternatives ways to access the data?

nmdefries commented 4 months ago

Also, which fields should be extracted? Are there particular signals of interest?