Open CSSEGISandData opened 4 years ago
Pls , what is the impact in time series .csv ? You create a line US, US in every timeseries ? Please explain
The time series data on 3-18 had a US, US field where the total recovery count was listed, which was rolling up the total recoveries in the US (that was great). On the upload yesterday evening, there is no US, US row showing the total recoveries.
Was this an error? Will US recoveries show today?
Thanks for all the work you have been doing around this! :)
In the recovered I see a value of 108 from 19th There is also a US,US line in Confirmed , with a value of 1 from today ? why ? and in Deaths a value 0
putting a visual on this
@JiPiBi yes, in the 03-19-2020 daily report, you do see the US, US column showing total recoveries. The problem we are talking about is there is no US, US row in the time series data that was posted on 3-19-2020.
@aheib987 the value I gave was observed in my time series
US,US,37.0902,-95.7129,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,108 |
---|
@JiPiBi can you send a link to that file? This time series file loaded last night does not have a row that lists the US in the Province/State field.
aheib987 is right and I have not been able to use the file for the last 2 days because of this reading. It showing the US recoveries as 0 meaning that no one has recovered. A few days ago it was at 17. We are reporting misinformation and it is troubling.
Understood. I'm sure this is a challenging data collection effort. When can we expect to see this change? Even at a country level this would be good to have.
Can you also explain to us a bit on what some of the inconsistencies are?
Thank you, Ryan
@aheib987 quite strange , my csv file comes from this site, my PR was made at about 1:00 UTC and now on the site I dont see anymore my line .....
But they made modifications : I see now that they suppressed duplicates , click on the left up link and you should see the suppressed lines in red as I did ( see line 478)
@JiPiBi yeah, it was correct on the 18th but then dropped off on the 19th. I have some python that calls out and deletes the old files and re-downloads the new ones and I noticed the US dropped to 0 for recovered. Hopefully in the upload today they'll place it back in. You can see in their daily report for the 19th, US is listed there like you showed above but gone in the time series file.
Daily File Link and Snapshot
I understand, but now for reliable data , my favourite site is https://www.worldometers.info/coronavirus/ the only issue is that I dont know how to get their data ...
It is the pass information changing and not merging into a common field is distrubin. I use the the recovery to maintain active calc.
JiPiBI, I check worldometers too and wish I could connect to that source :(...
It seems we have to pay for the data ... Even for non profit orgs
Not cool ...
JHU or CDC are perhaps more powerfull and have some arguments to obtain their data ?
Not sure... what a bummer.
I'm wondering how can Worldometers have more reliable data than JHU, was it because they have a lot of manual intervention to ensure data accuracy?
@gohkokhan For every new value , they give their sources , you can also read how they process their data on their site
Are there any updates on this issue????????????????
Recovered cases are not being reported at the country level in the timeseries recovered data. Can we please get an update?
@paolinic03 If you get an answer on this site , consider that as a miracle . So many try to everyday ....
@JiPiBi it’s all good lol. I’ve lost hope. Found a workaround for now.
@paolinic03 Yes some are fixing themselves the data On my side , I close my issues some days after opening , without being fixed by the site . A bit strange ...
Anyone aware of timeseries data from worldometers?
This dataset is becoming increasingly complicated to deal with.
FWIW the setup of the github repository is totally the wrong approach. As you change the name of a country, you shouldn't be leaving old data with the old name in the repository and making an increasingly complex script to collect the data into a single timeseries. The repository should be a single timeseries, and changing the name of a country should be retroactive and universal. That is what version control is for. Instead, by having per-day datasets with the format and conventions changing over time, you're forced to deal with this heterogeneity when combining data together, and it's causing all kinds of issues. You're inventing your own version control.
If you just had a single timeseries csv file, you could correct errors by modifying and committing the file, instead of whatever manual list of overrides you currently have to fix the data as it is collated.
Then pull requests would make sense and people could help fix errors, and there would be a single source of truth instead of the mess we have now.
Is there any update on this issue? I don't see any changes to the data updated today. This issue exist for quite some time. Will this be fixed or we need to move to other source? Its disturbing when you put lot of efforts in developing a report and due to inconsistency in data we are not able to publish it.
The general issue of this dataset is that it is not considered and managed as a database with numerical id instead of strings as keys and one record for one day and one entity, all in the same file and a daily value not a cumulative one.
Result : when you have an error on one day , you have to change all the following days and values are never fixed in all the daily cumulative reports ...
And when you change one name you have to change all the values instead of dealing with a table of id associated with the name of the entity (state / country / continent) where you change only one element . Result : we go on with daily reports never fixed and not coherent because the entities' labels have changed .
A bit disappointing ....
@CSSEGISandData
How you calculate recovered cases should be pretty easy:
If confirmed date was > 2-4 weeks ago and you are presently not dead, you're likely recovered.
The "number of U.S. patients tested" and this number of recoveries are unrealistic expectations:
I wouldn't expect to have billions of tests to administer to everyone in the world to periodically make sure each person is or is not infected. Because that's what you're telling me is the reasoning behind keeping a tally of tests done. We have patients who want a test each week because they're convinced they have a virus. Counting number of people tested is misguided.
"Let's make sure everyone gets tested" is frivolous when the 99.999% negative today will just need another test tomorrow. That's what you're saying we should count?
Due to the inconsistency in reporting recovered cases around the US, we have decided to report recovered cases at the country level until a more reliable source for recovered cases becomes available.