CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

Linear fit of exponential cases and deaths data and THANK YOU #922

Open valeriupredoi opened 4 years ago

valeriupredoi commented 4 years ago

Hey guys, feel free to close as soon as you have read this

I have started a project fitting the data you guys provide and have to say I am rather concerned with what they say on the media vs the actual data, see my project - I also wanted to thank you greatly for providing such a wealth of open source data!! Cheers from the UK :beer:

JiPiBi commented 4 years ago

Hi IMHO, What is also important to understand also is that in every infected country you have a province and a town that is more infected than the whole country with heavy rate of deaths and risks of being submergeg by the ICU needs :

I think it shoud be interesting to focus also on these hotspots , because the risk of contamination to medical people and the lack of ICU due to the flood of sick people is a major risk

But the data are not available in this repo for Italy and France (there is another repo for Italy because official data are available from Protezione Civile , but nothing consistent available for France). I dont know the situation in Spain , where the values are increasing at high speed these last days.

JiPiBi commented 4 years ago

About your curves , is it possible to have a tendency line that is polynomial and not linear , because for example in Italy, I see a slight inflexion in total cases

valeriupredoi commented 4 years ago

Hi IMHO, What is also important to understand also is that in every infected country you have a province and a town that is more infected than the whole country with heavy rate of deaths and risks of being submergeg by the ICU needs :

* In China, it was HUBEI and Wuhan

* In Italia it is Lombardia and Bergamo

* in France Alsace and Mulhouse but cases in  Ile de France are growing

I think it shoud be interesting to focus aloso on these hotspots , because the risk of contamination to medical people and the lack of ICU due to the flood of sick people is a major risk

But the data are not available in this repo for Italy and France (there is another repo for Italy because official data are available from Protezione Civile , but nothing consistent available for France). I dont know the situation in Spain , where the values are increasing at high speed these last days.

completely agree! In the case of the UK it's England (and London) which drive the infection numbers - see the dailly indicators and the NHS England sheets at the UK Government data repo - London alone has 621 cases as of today.

Unfortunately I don't know how to get such numbers for other countries, and since this project is not actually for my day job, I think I won't have the time to start digging. But by all means, feel free to contribute to my repo - opening a Pull request shoud do it :beer:

valeriupredoi commented 4 years ago

About your curves , is it possible to have a tendency line that is polynomial and not linear , because for example in Italy, I see a slight inflexion in total cases

Good point and I should be able to do that by fitting a P(xn) rather than just a line but sadly, for Italy, I fail to see any tendency, it's as straight a line as it can be (see the R param too and the small LS errors) - what is good is the rate is lower than the usual 0.25-0.30 day-1 as is the case for almost all the other countries Poor Italy

alkimiadev commented 4 years ago

I think fitting a curve to confirmed case counts is a little naive since confirmed case counts largely depend on the testing capacity that country has. I would say modeling deaths would make more sense since it is likely to be a number that is closer to the ground truth.

I wish I could find a reliable dataset that showed how many tests were given as well as the number confirmed. I think the ratio of confirmed cases to tests would be more useful than just confirmed cases.

valeriupredoi commented 4 years ago

yes, completely agree - the death rate is the only hard number (provided that the cases are completely due to the virus' complications) but even if the actual number of cases may be under estimated by an order of magnitude or more, given the number of confirmed cases is a result of random testing (people showing up at the hospital or being drive-thru tested) the rate is representative since the number of confirmed cases is a randomly-drawn sample from the true distribution of cases and obeys the same evolution over time. This of course breaks with introducing biases e.g. more and more people decide not to go get tested or they refuse them for testing (which is an emerging situation in Spain)

valeriupredoi commented 4 years ago

And as you say, the fatalities are the most reliable statistical measure of the actual cases: we know those cases were real cases and their distribution is drawn from the actual cases distribution of all infected individuals but might not be representative of the whole distribution (small numbers statistics) - the rate should be representative though, the rate should approach the infection rate we estimate from the number of tested cases (the one above), if the population doesn't display a massive age bias (like Italy) - but look at Germany, France, The Netherlands, these two rates are almost equal. Anyways - that's my 2 cents, cheers :beer: Also, feel free to close the issue, I am not a virulogist so I may not be the right person to listen to :grin: Cheers again for the data!

JiPiBi commented 4 years ago

@alkimiadev
IMHO , the quantity of tests realized is very difficult to interpret : I agree that if you make no test , you have less confirmed cases , but :

What is important now is to deal with the needs in ICU and beds in hospitals and mainly apply the confinment rules to contain the epidemia , it worked in China , we are eager to know what will be the results in Italy, Spain and France

Germany is for me a mystery , they have more cases than France and they have the smaller death rate (= death/confirmed even if this ratio is questionable : 1st the number of confirmed is not reliable and 2nd as it has already been said , you'll have to compare the dead of the moment to the confirmed some days ago , but how many days ? ) . If anyone has an explanation ....

JiPiBi commented 4 years ago

@valeriupredoi

On this curve , it seems for me that the curve for italy in confirmed seems to improve ( it is difficult to compare countries but for a given country it must indicate something , in fact I mainly hope that confinment in Italy is usefull ...)

image

JiPiBi commented 4 years ago

@valeriupredoi about the age of Italians , I have no elements about such a bias in comparizon with other countries , but after saying that dead people in China and in Italy were old people with heavy pathologies , one heard in France things like : even young people with no known pathologies die , and it is more worrying

JimBudde commented 4 years ago

@alkimiadev spot on. There are too many biases introduced in how testing is performed in each country and region. There are also structural differences in how each country manages its healthcare system. My take-away, be careful of trying to compare apples to apples. Simple take home message, get New Case volume down! That is something everyone can understand.

valeriupredoi commented 4 years ago

@valeriupredoi

On this curve , it seems for me that the curve for italy in confirmed seems to improve ( it is difficult to compare countries but for a given country it must indicate something , in fact I mainly hope that confinment in Italy is usefull ...)

image

yes true that! and it's great news! but I started running the automated daily plotting from March 1st only (I am very interested in seeing the UK trend, and the UK is delayed compared to Italy) and it really is linear after that date, for some reason it looks like the confirmed cases are steadily increasing (at a lower rate than before) but still increasing at a ~0.2 per day rate. I wonder when we'll see the next slowdown (hopefully soon!)

alkimiadev commented 4 years ago

I have been using method similar to the one used by Tomas Pueyo used in his medium article(https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca) to estimate total cases. We have to make some assumptions for that simple method like death rate, infection to death time and days to double rate. For example:

death rate: 2%(has to be lower than deaths/confirmed cases since there are definitely more cases than currently confirmed) infection to death in days: 20 case doubling in days: 5

When looking at the death rate for a given day we could say we're looking at infections that started roughly 20 days ago and with a case doubling rate of 5 the real case count would have doubled 4 times in that period. Which would make a formula that looks something like:

(deaths/0.02)^1.60206=estimated total cases

valeriupredoi commented 4 years ago

interesting, except that measured cases double every 2.5-3 days (2.7 days in the UK see here) so that would make it roughly 8 times doubling in the period - actually use the exp(bt) formula with b~0.27=220 times initial cases roughly eight times doubling

alkimiadev commented 4 years ago

@valeriupredoi The assumptions I used there are a bit on the conservative side but overall that method is likely going to be more useful than trying to model the confirmed case rates. Confirmed cases largely depend on the testing capacity and as a country ramps up their testing we'll see that countries confirmed cases double at a rate faster than the actual case doubling rate.

valeriupredoi commented 4 years ago

aha, got you now! Good reasoning indeed, so that, in effect, would mean halving the observed rate of daily cases. Cheers, I'll use this to place lower limits on the numbers I plot :beer:

TakeItAndRun commented 4 years ago

I would not put more trust in the numbers of confirmed death than the number of confirmed cases. It is easy to say you don't need a test if you have only light symptoms. It is easy to say your dead therefore you don't need a test anymore. Both happens when your number of tests are limited.

alkimiadev commented 4 years ago

@TakeItAndRun "All models are wrong, but some are useful" - George E. P. Box

nick-lagrassa commented 4 years ago

I have been using method similar to the one used by Tomas Pueyo used in his medium article(https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca) to estimate total cases. We have to make some assumptions for that simple method like death rate, infection to death time and days to double rate. For example:

death rate: 2%(has to be lower than deaths/confirmed cases since there are definitely more cases than currently confirmed) infection to death in days: 20 case doubling in days: 5

When looking at the death rate for a given day we could say we're looking at infections that started roughly 20 days ago and with a case doubling rate of 5 the real case count would have doubled 4 times in that period. Which would make a formula that looks something like:

(deaths/0.02)^1.60206=estimated total cases

Thanks for sharing this. There seems to be an increasing interest in using functions of observed deaths to estimate the number of true cases.

I just want to expand on the formual you provide for viewers who might want to use it.

The formula, as listed, is correct for the specific case where deaths = 2.

The general formula is: estimated total cases = *(observed_deaths/ fatality_rate) 2^(infection_to_death_in_days / case_doubling_in_days)**

So for the case where: observed_deaths = 10 fatality rate = 2% infection_to_death_in_days = 20 case_doubling_in_days = 5

estimated total cases = (10/0.02) 2^(20/5) = 500 2^4 = 8,000

alkimiadev commented 4 years ago

@nick-lagrassa if there were 100 cases 20 days ago and it doubled every 5 days there would be 1600 cases today. meaning an exponent of roughly 1.60206

nick-lagrassa commented 4 years ago

@alkimiadev You're right. My point was just that the exponent of 1.60206 only works for that specific example (100 cases with a 2% death rate => 2 deaths).

alkimiadev commented 4 years ago

@nick-lagrassa ah you are correct! thanks for pointing that out!

JiPiBi commented 4 years ago

@nick-lagrassa

the daily values for deaths in Italy are the following , the pace of deaths seems to change 463 631 827 1016 1266 1441 1809 2158 2503 2978 3405 4032 4825 At the beginning it doubles in 2 days et now about 5 days , pls what are your forecasts on these values ?

Edit : Confirmed cases in parallel 9172 10149 12462 15113 17660 21157 24747 27980 31506 35713 41035 47021 53578

Complement ; I was said that in Italy whatever of what people die , if they have covid , they are declared as dead from covid , but it must not change the trends

valeriupredoi commented 4 years ago

it's clear that the deaths in italy are not following an exponential anymore and they are starting to plateau: Italy - note that those least squares errors on the last points are of order 200-300 daily so I am hoping this is the actual case and not some data artefact due to Italians not reporting the deaths as they were doing it before - any news on this maybe? Cheers guys :beer: