covidatlas / coronadatascraper

COVID-19 Coronavirus data scraped from government and curated data sources.
https://coronadatascraper.com
BSD 2-Clause "Simplified" License
365 stars 180 forks source link

Missing data #691

Closed oneviewdata closed 4 years ago

oneviewdata commented 4 years ago

Just letting you know that there seems to be missing data from 16/03/2020-26/03/2020 for recovery and death data for Australia?

Also, I have only checked one or two days but data for Victoria doesn't seem to be in line with Victoria local website: For example, 13 March 2020, https://www2.health.vic.gov.au/about/media-centre/MediaReleases/more-covid-19-cases-confirmed-in-victoria-13-march-2020

Are you able to share your feedback on this? Thanks.

jzohrab commented 4 years ago

Hello @oneviewdata , the AU/Victoria scraper is at src/shared/scrapers/AU/VIC/index.js in this repository.

Per the code, that scraper first hits https://www.dhhs.vic.gov.au/media-hub-coronavirus-disease-covid-19 to get the list of links, and then scrapes the latest, eg. https://www.dhhs.vic.gov.au/coronavirus-update-victoria-4-april-2020. That page only has cases, nothing else.

If you know of a better data source that we can use, please let us know! And if you do file another issue, please add some location data in the title "Missing data for AU, VIC" is clearer for us than just "Missing data".

Thank you for the issue! jz

jzohrab commented 4 years ago

@oneviewdata , I'll close this issue in a few days if I don't hear back from you with a better source, assuming that this answers your question. Thanks again, jz

oneviewdata commented 4 years ago

thanks for looking into this. As per your dhhs website link, it said: "The new cases include 6 men and 3 women aged between 20-70. All cases are recovering at home in isolation. Of the 9 new cases, 7 have a history of international travel. Case interviews are still being completed with some cases."

So, for recovery figures that tells me 9 recoveries should be recorded on 13 March 2020 (+previous day recovery figure?) However, your data says 8 recovery.

jzohrab commented 4 years ago

I’ll need to go further into the code and data to find out. We only scrape well-formed data such as tables, csv, json etc, and not thing like articles, etc. I’ll see if I can find a better answer.

I didn’t see that sentence you quoted in either of my links. Can you link the page where you found it?

Thanks! Jz

oneviewdata commented 4 years ago

link: https://www2.health.vic.gov.au/about/media-centre/MediaReleases/more-covid-19-cases-confirmed-in-victoria-13-march-2020

hope that helps. thx

jzohrab commented 4 years ago

Hi there, thanks for the link. I think we're at the mercy of the data and different reporting structures/frequencies!

We're actually compiling and cross-checking data from several different sources:

I don't know where that link you sent got the number 9 from ... hard to say. Needless to say, it's tough getting things right, for all data sources, not just in our work!

If you notice a huge discrepancy, such as missing all data or lots of 0's, that's more of a concern. This off-by-one could simply be due to some minor inconsistencies in timeframes, availability, source updates, etc. Thoughts, @oneviewdata ?

oneviewdata commented 4 years ago

hi, i understand that there are multiple data sources for this. I was making sure that wherever your code is picking up the data from that it is picking up the right numbers. It seems like for recovery data you are getting this from JSU? Also, your data is missing recovery and death data for Victoria 17/03/2020-26/03/2020. Another source you may want to explore further is BNONEWS. Their data seems to align more with local data (Australia) - seems to be more accurate than JSU.

jzohrab commented 4 years ago

Interesting, thank you very much. Do you have a link for BNO News, with public data? And if you have a better suggestion for AUS data, let us know.

oneviewdata commented 4 years ago

link: https://bnonews.com/index.php/2020/04/the-latest-coronavirus-cases/ It has worldwide data. The data sources it is using seems to be all local from their respective places. If you want to get historical view, you can use http://archive.md/ website to try to locate the historical view of BNO website. I tried to do this myself but i do not have the "web scrapping" capability as yet. hope that helps. :)

praging commented 4 years ago

Also, I have only checked one or two days but data for Victoria doesn't seem to be in line with Victoria local website: For example, 13 March 2020, https://www2.health.vic.gov.au/about/media-centre/MediaReleases/more-covid-19-cases-confirmed-in-victoria-13-march-2020

This link doesn't specify the number of people recovered -- someone "recovering at home" isn't yet recovered. Almost everyone that is infected was at one point "recovering at home" -- some continued on to make a full recovery and be noted as such; others got worse and went to hospital, etc.

As of now, the issue I see that needs to be dealt with here is the loss of recovery data on 03-17 in continuity. It seems that even the 8 recoveries that we had as of 03-16 were dropped.

oneviewdata commented 4 years ago

ah, i see. thanks

camjc commented 4 years ago

BNO isn't a primary source, and the dhhs in VIC isn't fully scraped. I'd love it if you could open a PR to scrape more of DHHS VIC, but I don't think the BNO data is something I'd want to pursue. Ideally we can get into their PowerBI data somehow. Thanks for the interest in my home state :)