Closed oneviewdata closed 4 years ago
Hello @oneviewdata , the AU/Victoria scraper is at src/shared/scrapers/AU/VIC/index.js in this repository.
Per the code, that scraper first hits https://www.dhhs.vic.gov.au/media-hub-coronavirus-disease-covid-19 to get the list of links, and then scrapes the latest, eg. https://www.dhhs.vic.gov.au/coronavirus-update-victoria-4-april-2020. That page only has cases, nothing else.
If you know of a better data source that we can use, please let us know! And if you do file another issue, please add some location data in the title "Missing data for AU, VIC" is clearer for us than just "Missing data".
Thank you for the issue! jz
@oneviewdata , I'll close this issue in a few days if I don't hear back from you with a better source, assuming that this answers your question. Thanks again, jz
thanks for looking into this. As per your dhhs website link, it said: "The new cases include 6 men and 3 women aged between 20-70. All cases are recovering at home in isolation. Of the 9 new cases, 7 have a history of international travel. Case interviews are still being completed with some cases."
So, for recovery figures that tells me 9 recoveries should be recorded on 13 March 2020 (+previous day recovery figure?) However, your data says 8 recovery.
I’ll need to go further into the code and data to find out. We only scrape well-formed data such as tables, csv, json etc, and not thing like articles, etc. I’ll see if I can find a better answer.
I didn’t see that sentence you quoted in either of my links. Can you link the page where you found it?
Thanks! Jz
Hi there, thanks for the link. I think we're at the mercy of the data and different reporting structures/frequencies!
We're actually compiling and cross-checking data from several different sources:
I don't know where that link you sent got the number 9 from ... hard to say. Needless to say, it's tough getting things right, for all data sources, not just in our work!
If you notice a huge discrepancy, such as missing all data or lots of 0's, that's more of a concern. This off-by-one could simply be due to some minor inconsistencies in timeframes, availability, source updates, etc. Thoughts, @oneviewdata ?
hi, i understand that there are multiple data sources for this. I was making sure that wherever your code is picking up the data from that it is picking up the right numbers. It seems like for recovery data you are getting this from JSU? Also, your data is missing recovery and death data for Victoria 17/03/2020-26/03/2020. Another source you may want to explore further is BNONEWS. Their data seems to align more with local data (Australia) - seems to be more accurate than JSU.
Interesting, thank you very much. Do you have a link for BNO News, with public data? And if you have a better suggestion for AUS data, let us know.
link: https://bnonews.com/index.php/2020/04/the-latest-coronavirus-cases/ It has worldwide data. The data sources it is using seems to be all local from their respective places. If you want to get historical view, you can use http://archive.md/ website to try to locate the historical view of BNO website. I tried to do this myself but i do not have the "web scrapping" capability as yet. hope that helps. :)
Also, I have only checked one or two days but data for Victoria doesn't seem to be in line with Victoria local website: For example, 13 March 2020, https://www2.health.vic.gov.au/about/media-centre/MediaReleases/more-covid-19-cases-confirmed-in-victoria-13-march-2020
This link doesn't specify the number of people recovered -- someone "recovering at home" isn't yet recovered. Almost everyone that is infected was at one point "recovering at home" -- some continued on to make a full recovery and be noted as such; others got worse and went to hospital, etc.
As of now, the issue I see that needs to be dealt with here is the loss of recovery data on 03-17 in continuity. It seems that even the 8 recoveries that we had as of 03-16 were dropped.
ah, i see. thanks
BNO isn't a primary source, and the dhhs in VIC isn't fully scraped. I'd love it if you could open a PR to scrape more of DHHS VIC, but I don't think the BNO data is something I'd want to pursue. Ideally we can get into their PowerBI data somehow. Thanks for the interest in my home state :)
Just letting you know that there seems to be missing data from 16/03/2020-26/03/2020 for recovery and death data for Australia?
Also, I have only checked one or two days but data for Victoria doesn't seem to be in line with Victoria local website: For example, 13 March 2020, https://www2.health.vic.gov.au/about/media-centre/MediaReleases/more-covid-19-cases-confirmed-in-victoria-13-march-2020
Are you able to share your feedback on this? Thanks.