ccodwg / CovidTimelineCanada

A definitive dataset for COVID-19 in Canada
https://opencovid.ca/
Other
27 stars 11 forks source link

Add population values to geo #23

Closed jeanpaulrsoucy closed 2 years ago

apiology commented 2 years ago

Hi @jeanpaulrsoucy! The API change disabled data updates at microCOVID.org and I'm trying to (unofficially) help out getting the new format ingested.

It looks like microCOVID relied on population data in order to calculate prevalence by health region.

If you happen to be aware of another upstream source for population, that'd be helpful in the interim.

jeanpaulrsoucy commented 2 years ago

Hi @apiology, thanks for letting me know. What a cool project! At minimum, I can add population data for the provinces to pt.csv later today.

As for dealing with the new data format, see the note in the README giving links to case and death CSV files in the legacy format (i.e., the same format used for Covid19Canada).

apiology commented 2 years ago

@jeanpaulrsoucy: Appreciate it - we have province-level population and timeseries data for Canada already, so I think we're really looking for things at the health-region level. Where were you gathering health-region population from previously?

I did see that note, thank you--the legacy format doesn't contain the health-region-level population, so I don't think it would help us with this. I'll keep it in mind as I get further into this problem, though, perhaps there's another application.

jeanpaulrsoucy commented 2 years ago

Ah okay, in that case I can add the HR-level population data too. I didn't see the HR data being used in my quick scan of the site.

apiology commented 2 years ago

It's not currently - it had to be yanked because of this issue. It'll be added back once we can get your data ingested again.

apiology commented 2 years ago

Ah, it looks like I can grab this from the legacy repo at https://github.com/ccodwg/Covid19Canada/blob/master/other/hr_map.csv in the interim (the existing code was using https://api.opencovid.ca/other?stat=hr).

jeanpaulrsoucy commented 2 years ago

Hi @apiology, see geo/pt.csv and geo/health_regions.csv for the latest population data. In particular, use the pop column for the latest population data available for the geography. Data notes are here.

Let me know if you have any questions or issues!

apiology commented 2 years ago

Thanks so much! I've gotten a lot farther in other areas of the transition in the meantime. I'll give it a shot and give you a shout if there are any problems!

apiology commented 2 years ago

@jeanpaulrsoucy: I am seeing a potential issue while doing QC on https://github.com/microCOVID/microCOVID/pull/1452

When microCOVID.org fetches a vaccination report for health region 595 (Vancouver Coastal, BC) from api.covid19tracker.ca here, I see it claim that there are 1.1million people vaccinated (see total_vaccinated in the data).

However, your data claims a total population much less than that here - 300,970.

Which do you think is correct/incorrect?

jeanpaulrsoucy commented 2 years ago

Hi @apiology, thanks for catching this! The labels on some of the BC health regions got switched. This has been fixed in health_regions.csv, health_regions.geojson and all of the data files. #55

apiology commented 2 years ago

@jeanpaulrsoucy - thanks for the fast turnaround! However, the population for hruid 595 (now labeled Northern Health), is still out of line with the total_vaccinated data I'm seeing from api.covid19tracker.ca here, which is fetched via the hruid 595.

Thoughts? I see the same issue for hruid 592.

jeanpaulrsoucy commented 2 years ago

Hmm, may be an error on their end then. (Link to BC health regions IDs from StatCan)

Some screenshots from the BC dashboard:

Northern (595):

595

Total doses are 537,461, which makes sense with a total population ~300k.

Fraser (592):

592

Total doses are 4,199,845, which makes sense with a total population ~2 million.

This is also ~consistent with what is reported on the covid19tracker.ca page of BC vaccination. So it seems this is just a problem of HRUID labels in the API.

jeanpaulrsoucy commented 2 years ago

Okay, I figured it out.

The health region IDs for BC used by covid19tracker.ca are wrong: https://api.covid19tracker.ca/province/BC/regions ...but this traces back to the health region IDs for BC used by us in Covid19Canada being wrong: https://github.com/ccodwg/Covid19Canada/blob/master/other/hr_map.csv ...but this traces back to the health region IDs for BC used by ESRI (for their map file) being wrong: https://resources-covid19canada.hub.arcgis.com/datasets/covid19canada::health-region-summaries

However, our current HRUIDs for BC are now correct, according to both StatCan and BC itself.

jeanpaulrsoucy commented 2 years ago

I've sent a note to @noahlittle so hopefully we can get aligned on this shortly. :)