dsfsi / covid19za

Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
https://dsfsi.github.io/covid19za-dash/
MIT License
255 stars 200 forks source link

Numbers for the 07/01/2021 are a repeat of 06/01/2021 #794

Closed TeeZA42 closed 3 years ago

TeeZA42 commented 3 years ago

covid19za_provincial_cumulative_timeline_confirmed covid19za_timeline_testing

repeated numbers

richardyoung00 commented 3 years ago

This also affects: covid19za_provincial_cumulative_timeline_deaths.csv covid19za_provincial_cumulative_timeline_recoveries.csv

for dates 08-01-2021, 07-01-2021, 06-01-2021

dmackie commented 3 years ago

@lrossouw Your "Scrape & update cumulative provincial data" is adding incorrect data.

I have a number of questions:

  1. Is this been added automatically?
  2. Why is this not been done by a PR?
  3. What code do you have to prevent incorrect data been added?
TeeZA42 commented 3 years ago

I notice the file naming convention changed from: https://www.nicd.ac.za/latest-confirmed-cases-of-covid-19-in-south-africa-05- jan-2021 to: https://www.nicd.ac.za/latest-confirmed-cases-of-covid-19-in-south-africa-06- january-20210

So maybe that effected the automation

dmackie commented 3 years ago

I have fixed the stats by deleting the rows for 2020-01-08 and correcting the numbers for 2020-01-07. Hopefully, no automatic script is going to over write those now.

dmackie commented 3 years ago

Re-opening issue as I still think there is an issue with possible an automation script of @lrossouw

lrossouw commented 3 years ago

Sorry guys I was just investigating. I will fix. It's indeed automated.

lrossouw commented 3 years ago

The error slipped in because the URL used to publish was changed. It is usually following this format: https://www.nicd.ac.za/latest-confirmed-cases-of-covid-19-in-south-africa-05-jan-2021/ But on the 6th it wasn't picking up because the format had changed to https://www.nicd.ac.za/latest-confirmed-cases-of-covid-19-in-south-africa-06-january-20210/ Note the written out january.

On the 6th I manually changed the URL my script checks to the one shown above, so that it could capture the data,, but forgot to change it back. Which meant that it kept checking that URL and using it to produce data for the additional days.

I have several checks in place:

Posting exactly the same data passes all the checks. This is the first time it committed incorrect data and was due to human error.

I had shared details of this process when I implemented it in #767 when I implemented it.

I've fixed the problem and this can be closed.

lrossouw commented 3 years ago

Closing given no further comments.