Cases: Date the case was reported publicly (most provinces), date the case was reported internally (e.g., BC), “episode date” (i.e., proxy for date of symptom onset) (e.g., ON)
Testing: Date sample was taken, date result was reported (QC reports a variety of dates)
Locally versus travel-acquired cases (especially important for Atlantic Canada)
Testing: provincial versus private testing (e.g., BC)
Other considerations
How to combine all these data, especially if different sources have different sub-group information? (Some sources will provide demographic data, some sources will provide vaccination status, some sources will provide multiple categories of information but use INCOMPATIBLE groupings, like different age bands)
How to deal with cases missing health region information ("not reported") and resident out-of-province cases (double counting?)
How to handle when definitions change mid-time series (and no retroactive correction is available)? (e.g., Ontario testing definition may be an example of this)
How to handle data about repatriated travellers (i.e., we have case and testing data, recovered and mortality can be inferred)
Should "impossible data" (e.g., negative change in cases) be automatically corrected in the final product?
And of course the most important question of all: wide or long format (or multiple separate datasets instead of long format)?
Discussions of the data standard for specific types of data are linked below:
2
3
4
5
6
9
10
11
7
Other aspects of the data standard are linked below:
8
A list of some sources of inspiration for the data standard:
Some considerations for the data standard:
And of course the most important question of all: wide or long format (or multiple separate datasets instead of long format)?