Consensas / covid

COVID-2019 data / code
15 stars 1 forks source link

access to cases by health region #2

Open newhook opened 4 years ago

newhook commented 4 years ago

This file used to contain data on cases per health region in ontario. https://raw.githubusercontent.com/Consensas/covid/master/data/ca.cases/ca-on.yaml

It seems to have moved, or potentially be gone altogether. Any chance to restore that?

There was also a data problem with that file in that the names did not match the names of the health regions in the boundary files from stats Canada.

https://www150.statcan.gc.ca/n1/pub/82-402-x/2015001/gui-eng.htm#a5

dpjanes commented 4 years ago

I'm trying to come up with a way of "normalizing" the names of the regions. I've also found data here: https://www.cihi.ca/en/access-data-and-reports.

I've moved the data you see above into a subfolder called cooked (which I'm trying to do in all folders) https://github.com/Consensas/covid/tree/master/data/ca.cases/cooked

newhook commented 4 years ago

This seems to be a canonical list of region names and codes. https://www150.statcan.gc.ca/n1/pub/82-402-x/2015001/app-ann/ap-an1-eng.htm

I just spent a bit of time with the geojson property data embedded in the files.

$ cat HRP035b11m_e_Oct2013.geojson | jq '.features[].properties'

Taking an example here:

{
  "HR_UID": "3595",
  "ENG_LABEL": "City of Toronto Health Unit",
  "FRE_LABEL": "Circonscription sanitaire de la cité de Toronto"
}

Looking at, for example https://github.com/Consensas/covid/blob/master/data/ca.cases/cooked/ca-ab.yaml

This file uses "Edmonton" vs "Edmonton Zone", and "Calgary" vs "Calgary Zone" from the geojson properties and the canonical list.

So given: 4832 Calgary Zone 4834 Edmonton Zone

It feels like add the ID 4832 for "Calgary Zone" and 4834 for "Edmonton Zone" to the yaml in addition to the health region name would disambiguate things (assuming that the source material has access to that).

dpjanes commented 4 years ago

Thanks Matthew

So this is exactly what I plan to do is map all the cases to IDs. This (https://github.com/Consensas/covid/blob/master/data/ca.statcan.health-regions/zones.yaml) has all the zone names, and zone "fragments" for pattern matching. Plus every data point has a unique "@id", for cross referencing. See JSON-LD if you're not familiar with that.

So for example this record:

  - '@id': 'urn:covid:consensas:ca.cases:47'
    dataset_id: '47'
    region_id: '1'
    sources:
      - >-
        https://edmonton.ctvnews.ca/alberta-s-first-presumptive-coronavirus-case-in-calgary-zone-1.4841023
    date: '2020-03-05'
    week_reported: '2020-03-01'
    is_travel: true
    age_range: 50-59
    gender: Female
    health_region: Calgary
    acquired_country: null

would become something like

  - '@id': 'urn:covid:consensas:ca.cases:47'
    dataset_id: '47'
    region_id: '1'
    sources:
      - >-
        https://edmonton.ctvnews.ca/alberta-s-first-presumptive-coronavirus-case-in-calgary-zone-1.4841023
    date: '2020-03-05'
    week_reported: '2020-03-01'
    is_travel: true
    age_range: 50-59
    gender: Female
    health_region: Calgary
    health_region_id: 'urn:covid:statcan.gc.ca:health-region:ca-ab:4832'
    acquired_country: null
dpjanes commented 4 years ago

@newhook - stats can probably has updated region data : BC in particular has been reorged

The following are missing:

dpjanes commented 4 years ago

I'm wondering if we need a lower resolution version? May start melting down maps when I start doing lots of these

image

newhook commented 4 years ago

Yeah, you are right. They don't make it easy to find! Here are latest versions I think.

https://www150.statcan.gc.ca/n1/pub/82-402-x/2018001/hrbf-flrs-eng.htm

The files can be downresezed more with the tool that I linked earlier. The thing is depending on the purpose they need to be eye balled to see how ridiculous they look.