Open newhook opened 4 years ago
I'm trying to come up with a way of "normalizing" the names of the regions. I've also found data here: https://www.cihi.ca/en/access-data-and-reports.
I've moved the data you see above into a subfolder called cooked
(which I'm trying to do in all folders)
https://github.com/Consensas/covid/tree/master/data/ca.cases/cooked
This seems to be a canonical list of region names and codes. https://www150.statcan.gc.ca/n1/pub/82-402-x/2015001/app-ann/ap-an1-eng.htm
I just spent a bit of time with the geojson property data embedded in the files.
$ cat HRP035b11m_e_Oct2013.geojson | jq '.features[].properties'
Taking an example here:
{
"HR_UID": "3595",
"ENG_LABEL": "City of Toronto Health Unit",
"FRE_LABEL": "Circonscription sanitaire de la cité de Toronto"
}
Looking at, for example https://github.com/Consensas/covid/blob/master/data/ca.cases/cooked/ca-ab.yaml
This file uses "Edmonton" vs "Edmonton Zone", and "Calgary" vs "Calgary Zone" from the geojson properties and the canonical list.
So given: 4832 Calgary Zone 4834 Edmonton Zone
It feels like add the ID 4832 for "Calgary Zone" and 4834 for "Edmonton Zone" to the yaml in addition to the health region name would disambiguate things (assuming that the source material has access to that).
Thanks Matthew
So this is exactly what I plan to do is map all the cases to IDs. This (https://github.com/Consensas/covid/blob/master/data/ca.statcan.health-regions/zones.yaml) has all the zone names, and zone "fragments" for pattern matching. Plus every data point has a unique "@id", for cross referencing. See JSON-LD if you're not familiar with that.
So for example this record:
- '@id': 'urn:covid:consensas:ca.cases:47'
dataset_id: '47'
region_id: '1'
sources:
- >-
https://edmonton.ctvnews.ca/alberta-s-first-presumptive-coronavirus-case-in-calgary-zone-1.4841023
date: '2020-03-05'
week_reported: '2020-03-01'
is_travel: true
age_range: 50-59
gender: Female
health_region: Calgary
acquired_country: null
would become something like
- '@id': 'urn:covid:consensas:ca.cases:47'
dataset_id: '47'
region_id: '1'
sources:
- >-
https://edmonton.ctvnews.ca/alberta-s-first-presumptive-coronavirus-case-in-calgary-zone-1.4841023
date: '2020-03-05'
week_reported: '2020-03-01'
is_travel: true
age_range: 50-59
gender: Female
health_region: Calgary
health_region_id: 'urn:covid:statcan.gc.ca:health-region:ca-ab:4832'
acquired_country: null
@newhook - stats can probably has updated region data : BC in particular has been reorged
The following are missing:
I'm wondering if we need a lower resolution version? May start melting down maps when I start doing lots of these
Yeah, you are right. They don't make it easy to find! Here are latest versions I think.
https://www150.statcan.gc.ca/n1/pub/82-402-x/2018001/hrbf-flrs-eng.htm
The files can be downresezed more with the tool that I linked earlier. The thing is depending on the purpose they need to be eye balled to see how ridiculous they look.
This file used to contain data on cases per health region in ontario. https://raw.githubusercontent.com/Consensas/covid/master/data/ca.cases/ca-on.yaml
It seems to have moved, or potentially be gone altogether. Any chance to restore that?
There was also a data problem with that file in that the names did not match the names of the health regions in the boundary files from stats Canada.
https://www150.statcan.gc.ca/n1/pub/82-402-x/2015001/gui-eng.htm#a5