datasets / covid-19

Novel Coronavirus 2019 time series data on cases
https://datahub.io/core/covid-19
1.16k stars 604 forks source link

Delayed data - last update 4 days ago #91

Closed snympi closed 3 years ago

snympi commented 3 years ago

I've been using the data for months now and accustomed to the 2-day delay. It seems that updates stopped at 18 May's data. Could you please confirm or update?

anuveyatsu commented 3 years ago

hey @snympi thanks for reporting. I can confirm that we are working on fixing this so it will be updated ASAP.

snympi commented 3 years ago

@anuveyatsu thanks for confirming. Any ETA on the fix?

lordruzi commented 3 years ago

Hi everyone, Any update on this?

mattlgardner commented 3 years ago

Hi all - Guessing still no change on the missing data?

raniermc commented 3 years ago

Hey, this project it's over?

olwal commented 3 years ago

Same here, any ETA on bringing back the up-to-date data? (I see that it is being worked on in https://github.com/datasets/covid-19/pull/93

Would be good to know before people start migrating to different data sources. I really appreciated your work and how easy it was to integrate this w/ for example the population datasets. It also seems that there's a few people that might be willing to help (me included). Let us know!

Either way, thanks for your great work!

sdangt commented 3 years ago

This is really too bad. I think a lot of people were using this data, which was presented in a nice clean format. It would just be helpful to know in advance if a project is going to stop. I incorporated this into my spreadsheet analysis, but the data here is now nearly two weeks out of date, and there are no updates about it, so it seems like people just disappeared into the night. Thanks for the effort initially. No one knew for sure how long this pandemic would go, but I think it was clear that it would probably still be troublesome at this point, and it probably will be into next year as well. So data needs to go at least through that period if it is to be most useful.

Morisset commented 3 years ago

It is possible to generate the tables locally on your own computer by running the following script: https://raw.githubusercontent.com/datasets/covid-19/master/scripts/process.py I did a fork of the repository and run the process script in it, then commit the new files, so on my fork the data are uptodate. I will try to run it everyday. My fork is there: https://github.com/Morisset/covid-19

sglavoie commented 3 years ago

Thank you @Morisset for sharing this information with others and for hosting a backup of the latest data :+1:. This is indeed a viable solution while we are working on a fix. I have updated the data manually in this repo for now in https://github.com/datasets/covid-19/commit/3497aad30f6d1da1e501a9b43fb01635ae5e2466.

Morisset commented 3 years ago

Great @sglavoie ! How do we proceed now? Do I update the data manually on my fork everyday, or will you do it?

sglavoie commented 3 years ago

If you would like to give us a hand with this while we get to provide a more permanent solution on our side, that would be quite useful! You might want to send a daily pull request while we get this in order so you are recognized as a valuable contributor in the process and we can all enjoy up-to-date data :wink:.

Morisset commented 3 years ago

OK, I'll try to do it. I created a crontab that will to the job everyday at midnight PDT: run process.py, commit to git, push to my fork on github. But I do not know how to request the pull from a shell script. Any tip on this @sglavoie ?

sglavoie commented 3 years ago

@Morisset: this could be useful for the PR. If not, don't hesitate to let us know when you're updating. Else, I will have a look and merge the changes a bit later. In any case, thanks for your helping hand! Hopefully we'll get things in order soon enough :wink:.

Morisset commented 3 years ago

OK, I'll have the following script running every midnight PDT: #!/bin/tcsh set TODAY=date +"%m-%d-%y" cd /home/morisset/covid-19 python scripts/process.py git commit -a -m "Update $TODAY" git push hub pull-request -m "Test command line" -b datasets:master -h Morisset:master I just ran it, data are note updated, but one json file was. The script run correctly, a request have been sent, let's see tonight if OK from crontab.

jochym commented 3 years ago

@sglavoie Let me ask: why the procedure takes so long? Can we do something about that? Can I help in some way? Maybe we should rethink the algorithm - otherwise I think that this may be unsustainable i the coming months (this thing is not going away soon, sedly).

sglavoie commented 3 years ago

@Morisset: Thank you, changes merged! However, could you manually pull the updates before your script runs again so we are in sync and can merge cleanly? That would be great. Thanks for helping by the way :wink:.

@jochym: we haven't been able to work on the issue so far, but thankfully we'll start bringing a fix really soon, don't you worry! :smile: We will keep the data updated for now with the kind help from @Morisset, which is mostly an automated process. The script is being run off the same code we have and produces the expected output, but we will optimize the code so it doesn't take so long to get the changes in. A partial fix already exists using a different library (Pandas): I will let you know once it's done and we have QA the results before making those changes to the repo. :+1:

sglavoie commented 3 years ago

FIXED: Since commit c5b86f83f30c81ca2f814c3afc1ca720ed3edd50 from 3 days ago, the data has been successfully updating on GitHub. Closing this issue as nothing else needs to be done for now with the code itself.

Note: There is a problem regarding the deployment to DataHub, but this is unrelated to the data itself, which is correctly being updated here.