Open dajmcdon opened 9 months ago
I was talking with Ron about this, and the partial revision behavior came up. Do you think we should start archiving now, before the flu season ends, for those signals which don't include revision behavior? Or is your student effectively covering that aspect already?
I think it's actually easier than that, but maybe I'm not being very clear. I think there are 2 cases
Raw data is available in html tables at the link above. I can try to contact the agency to see if there is an easier and more timely way to access it.
@dajmcdon Did you hear back about any alternatives ways to access the data?
Also, which fields should be extracted? Are there particular signals of interest?
I would like to add a few signals all from the same source. The data is in Tables published weekly by the Public Health Agency of Canada. That link is for the most recent report.
I'm currently working with a student to collect all the previous versions, but it would be great to serve all of this data along with new information as it is added. My goal would be to have this ready to go by the beginning of next year's season, which is usually near the end of August.
Data details
Additional context
Some other questions @nmdefries suggested I address:
Yes, but only some of the signals. Revisions go back as far as the beginning of the current season.
Some of all of this. The most disaggregated data is for a specific (potentially biased) collection of labs. There's also internal processing with private data that I'm not aware of. Some geographies have greater coverage than others.
Not really.
I think "as is" mainly. It may be helpful to convert some percentages to raw counts, but this isn't entirely necessary (as the denominator is also present in the source).
Probably just a few crosswalks with Lab -> Province -> Region. I can help with this.
On my end:
The data is public, but not otherwise served. I should check any data use information to see if there are potential issues. I should also check if they're willing to give us a more useful format/source than scraping the website when it get's released (and being subject to unknown decisions to change the format).