CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

US Recovered Cases #1113

Open CSSEGISandData opened 4 years ago

CSSEGISandData commented 4 years ago

Due to the inconsistency in reporting recovered cases around the US, we have decided to report recovered cases at the country level until a more reliable source for recovered cases becomes available.

JiPiBi commented 4 years ago

Pls , what is the impact in time series .csv ? You create a line US, US in every timeseries ? Please explain

aheib987 commented 4 years ago

The time series data on 3-18 had a US, US field where the total recovery count was listed, which was rolling up the total recoveries in the US (that was great). On the upload yesterday evening, there is no US, US row showing the total recoveries.

Was this an error? Will US recoveries show today?

Thanks for all the work you have been doing around this! :)

JiPiBi commented 4 years ago

In the recovered I see a value of 108 from 19th There is also a US,US line in Confirmed , with a value of 1 from today ? why ? and in Deaths a value 0

smouksassi commented 4 years ago

bug

putting a visual on this

aheib987 commented 4 years ago

@JiPiBi yes, in the 03-19-2020 daily report, you do see the US, US column showing total recoveries. The problem we are talking about is there is no US, US row in the time series data that was posted on 3-19-2020.

JiPiBi commented 4 years ago

@aheib987 the value I gave was observed in my time series

US,US,37.0902,-95.7129,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,108  
aheib987 commented 4 years ago

@JiPiBi can you send a link to that file? This time series file loaded last night does not have a row that lists the US in the Province/State field.

paolinic03 commented 4 years ago

aheib987 is right and I have not been able to use the file for the last 2 days because of this reading. It showing the US recoveries as 0 meaning that no one has recovered. A few days ago it was at 17. We are reporting misinformation and it is troubling.

ryanwoconnor commented 4 years ago

Understood. I'm sure this is a challenging data collection effort. When can we expect to see this change? Even at a country level this would be good to have.

Can you also explain to us a bit on what some of the inconsistencies are?

Thank you, Ryan

JiPiBi commented 4 years ago

@aheib987 quite strange , my csv file comes from this site, my PR was made at about 1:00 UTC and now on the site I dont see anymore my line .....
But they made modifications : I see now that they suppressed duplicates , click on the left up link and you should see the suppressed lines in red as I did ( see line 478)

aheib987 commented 4 years ago

@JiPiBi yeah, it was correct on the 18th but then dropped off on the 19th. I have some python that calls out and deletes the old files and re-downloads the new ones and I noticed the US dropped to 0 for recovered. Hopefully in the upload today they'll place it back in. You can see in their daily report for the 19th, US is listed there like you showed above but gone in the time series file.

Daily File Link and Snapshot C2ACBBC8-DA43-409D-A197-CD363688A8AC

JiPiBi commented 4 years ago

I understand, but now for reliable data , my favourite site is https://www.worldometers.info/coronavirus/ the only issue is that I dont know how to get their data ...

theronrr commented 4 years ago

It is the pass information changing and not merging into a common field is distrubin. I use the the recovery to maintain active calc.

paolinic03 commented 4 years ago

JiPiBI, I check worldometers too and wish I could connect to that source :(...

LunaPg commented 4 years ago

It seems we have to pay for the data ... Even for non profit orgs

Not cool ...

https://www.worldometers.info/licensing/what/

JiPiBi commented 4 years ago

JHU or CDC are perhaps more powerfull and have some arguments to obtain their data ?

paolinic03 commented 4 years ago

Not sure... what a bummer.

gohkokhan commented 4 years ago

I'm wondering how can Worldometers have more reliable data than JHU, was it because they have a lot of manual intervention to ensure data accuracy?

JiPiBi commented 4 years ago

@gohkokhan For every new value , they give their sources , you can also read how they process their data on their site

paolinic03 commented 4 years ago

Are there any updates on this issue????????????????

ryanwoconnor commented 4 years ago

Recovered cases are not being reported at the country level in the timeseries recovered data. Can we please get an update?

JiPiBi commented 4 years ago

@paolinic03 If you get an answer on this site , consider that as a miracle . So many try to everyday ....

paolinic03 commented 4 years ago

@JiPiBi it’s all good lol. I’ve lost hope. Found a workaround for now.

JiPiBi commented 4 years ago

@paolinic03 Yes some are fixing themselves the data On my side , I close my issues some days after opening , without being fixed by the site . A bit strange ...

chrisjbillington commented 4 years ago

Anyone aware of timeseries data from worldometers?

This dataset is becoming increasingly complicated to deal with.

FWIW the setup of the github repository is totally the wrong approach. As you change the name of a country, you shouldn't be leaving old data with the old name in the repository and making an increasingly complex script to collect the data into a single timeseries. The repository should be a single timeseries, and changing the name of a country should be retroactive and universal. That is what version control is for. Instead, by having per-day datasets with the format and conventions changing over time, you're forced to deal with this heterogeneity when combining data together, and it's causing all kinds of issues. You're inventing your own version control.

If you just had a single timeseries csv file, you could correct errors by modifying and committing the file, instead of whatever manual list of overrides you currently have to fix the data as it is collated.

Then pull requests would make sense and people could help fix errors, and there would be a single source of truth instead of the mess we have now.

anjankarpak2110 commented 4 years ago

Is there any update on this issue? I don't see any changes to the data updated today. This issue exist for quite some time. Will this be fixed or we need to move to other source? Its disturbing when you put lot of efforts in developing a report and due to inconsistency in data we are not able to publish it.

JiPiBi commented 4 years ago

The general issue of this dataset is that it is not considered and managed as a database with numerical id instead of strings as keys and one record for one day and one entity, all in the same file and a daily value not a cumulative one.

Result : when you have an error on one day , you have to change all the following days and values are never fixed in all the daily cumulative reports ...

And when you change one name you have to change all the values instead of dealing with a table of id associated with the name of the entity (state / country / continent) where you change only one element . Result : we go on with daily reports never fixed and not coherent because the entities' labels have changed .

A bit disappointing ....

jawz101 commented 4 years ago

@CSSEGISandData

How you calculate recovered cases should be pretty easy:

If confirmed date was > 2-4 weeks ago and you are presently not dead, you're likely recovered.

The "number of U.S. patients tested" and this number of recoveries are unrealistic expectations:

I wouldn't expect to have billions of tests to administer to everyone in the world to periodically make sure each person is or is not infected. Because that's what you're telling me is the reasoning behind keeping a tally of tests done. We have patients who want a test each week because they're convinced they have a virus. Counting number of people tested is misguided.

"Let's make sure everyone gets tested" is frivolous when the 99.999% negative today will just need another test tomorrow. That's what you're saying we should count?