dsfsi / covid19za

Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
https://dsfsi.github.io/covid19za-dash/
MIT License
255 stars 200 forks source link

gis_nicd_scraper #840

Open lrossouw opened 3 years ago

lrossouw commented 3 years ago

Is your feature request related to a problem? Please describe. The new way to share data is here: https://sacoronavirus.co.za/live-counter/

Not via the NICD page so my scraper will need to be retired and redesigned.

lrossouw commented 3 years ago

And it's broken...

vukosim commented 3 years ago

We are unfortunately dealing with the fallout of digital vibes. Would NICD be better @lrossouw?

lrossouw commented 3 years ago

It looked like they stopped doing them. But I see the 9th is available again. They skipped the 8th though.

lrossouw commented 3 years ago

I also see the NICD page layout has changed. Captured all data manually until 9 June. Going to wait for the dust to settle before I update my scripts, but of course this week appears to be key in terms of stats that is coming out!

lrossouw commented 3 years ago

I've created something new that would (hopefully!) be more stable. It collects from various sources so no longer posting the exact urls as the source. I'm flagging auto scraped results as source = "gis_nicd_scraper" as most of the data is scraped from public dashboard there, Don't have anything for vaccines yet.

lrossouw commented 3 years ago

I scan two or three different dashboards for the figures. I typically can get the cases and tests the evening they are released from one dashboard (but they do break the dash from time to time). I usually seem to pickup the deaths and recoveries the next day (and the cases from this if the other dashboard is broken). Haven't solved the vaccines yet.

But these are more stable than scarping the pages of the media releases that change all the time.

dmackie commented 3 years ago

@lrossouw It seems the scraper has stopped working again? Before I do a manual update just want to check on it's status?

lrossouw commented 3 years ago

Tx did not notice with all the other COVID-19 news out. Will have a look.

lrossouw commented 3 years ago

Sometimes the Rt calculation is still running (someone else maintains that?) and it commits back and creates a conflict with my process. So my bot keeps updating my local repo but can't push until I manually resolve the conflict.

Not sure how to fix that.

Anyway it's resolved now.

lrossouw commented 3 years ago

It might be this: https://github.com/dsfsi/covid19za/commit/f8bfa831bc19b7efd4fd50e7dda7e5718bd1a137#diff-0e6e5c3c2330a562992a4157e9afb54fdea1938025dd074fec10a03e4e655aed

Can we make it pull before the push here as my bot might have made changes while this bot was running. That way it seems less likely to get into conflicts. @vukosim do you maintain that code?

vukosim commented 3 years ago

I will check late this evening. It runs after a change to the file.

lrossouw commented 3 years ago

My bot posts new case data, Rt bot runs and then updates new data comes in and my bot update again while Rt is running. It creates a technical merge conflict but Rt bot uses --force so overwrites. Perhaps do a pull just before the push to bring the latest changes in. So you don't effectively reverse my or other changes. Rt bot has also reversed other data I captured manually before. I.e. I capture vaccine data while it's running and then it kind of reverse it.

dmackie commented 3 years ago

I did a manual update of Death and Recoveries today as they had not updated by mid-day.

https://github.com/dsfsi/covid19za/pull/854

lrossouw commented 3 years ago

Sorry just noticed now. Will sort it out.

lrossouw commented 3 years ago

Fixed. @vukosim did you managed to update the bot?

janvdl commented 3 years ago

Good morning. The provincial data for confirmed cases for 2021-07-05 is missing. Should I add it manually or would you prefer to have the scraper make another pass and add it instead?

lrossouw commented 3 years ago

Seems to be there (line 487): https://github.com/dsfsi/covid19za/blob/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv#L487

janvdl commented 3 years ago

Seems to be there (line 487):

https://github.com/dsfsi/covid19za/blob/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv#L487

Louis, apologies if I missed something, will check my import again tonight and get back to you.

janvdl commented 3 years ago

Seems to be there (line 487):

https://github.com/dsfsi/covid19za/blob/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv#L487

My mistake, sorry. Data for 05 July 2021 is indeed available in the confirmed cases file, but is missing from cumulative deaths and cumulative recoveries.

lrossouw commented 3 years ago

Ah 5 July had issue I believe:https://www.nicd.ac.za/latest-confirmed-cases-of-covid-19-in-south-africa-05-july-2021/

They did not release provincial figures for deaths and recoveries on NICD site but I see now they are available here: https://sacoronavirus.co.za/2021/07/05/update-on-covid-19-05th-july-2021/

Feel free to capture.

lrossouw commented 3 years ago

My data source stopped providing deaths/recoveries in machine readable form on 30 July or so. What sources are people using?

shaze commented 3 years ago

@lrossouw Does this problem apply to testing too? covid19za_timeline_testing.csv I don't know how I missed that this has not been updating since the end of July.

vukosim commented 3 years ago

Thanks @shaze yeah that needs to be updated.

vukosim commented 3 years ago

Also looping @krokkie seems we have a few more failures. So we might need again to sync between you and @lrossouw