Open MarkCorke opened 4 years ago
Mortality can't technically be calculated for a disease until after it's completely over. To do it during the crisis is part 'statistical' and with the understanding that no nation has any count of the TRUE number of existing cases.
This isn't a conspiracy -- it's just an artifact that only measured cases are known (sort of a tautology).
I recommend pulling the CSVs and doing your own trend lines on the rolling 10-day average of just the death RATE (% change day over day of deaths only -- not using confirmed # which is practically unknown).
When I use excel graphs with a 2-order polynomial trend line, I can see the rate of new deaths is decreasing and about to hit a (possibly) stable point of approx 2.2% increase in deaths per day. But this is VERY preliminary -- more data is needed to see where it 'settles'.
Is a 2-order polynomial trend line acceptable for a data set like this? I'm open to advice but it looks right.
Not to be alarmist, but technically a person dying today would be counted in the 'confirmed' cases 2-14 days ago (incubation period) so dividing by the current day's confirmed might not be correct. Sadly, this gives a mortality much higher (4% when using deaths/(10-day old confirmed)). But remember -- nobody knows the 'true' amount of people walking around.
Only if you had a sci-fi blacklight type of light could you really know the number of exposed vs death.
Is my approach valid? I'm open to alternatives. Interestingly, we can all dramatically reduce these numbers. by getting family and friends to follow the world-wide guidelines on hand washing, hand sanitizer, and .5% Sodium Hypoclorite wipes for surfaces (COVID-19 survives on surfaces for days on steel and many hours on paper).
So in a very real sense, John Hopkins is helping to save lives since we can analyze trends and prepare.
I was also wrapping my head around a better way to trend mortality rates while this is still in progress and so far I was also thinking to a similar approach as the one @bluef00ted was suggesting (rolling on weekly/byweekly averages and carving out trend lines from there) - although not perfect, a bit better than rates by overall totals.
I have a simple spreadsheet I am working with. I have a similar set up. Based on 5 day and 10 day look back period for confirmed cases and deaths. Not perfect by any means, but as @bluef00ted mentioned above, gives you insight to potential infection period vs deaths. I also have it broken down to China, Outside China and All.
@bluef00ted Diseases and other "reproductive" things like interest on an investment follow an exponential function. If it's a straight line on a log plot it's precisely an exponential. The infected counts are not following a perfect straight line because of lack of testing in the beginning, then testing finally coming online causing a jump. Also, countries are responding to it more aggressively once it gets worse or with more skill as they learn, but then later if it gets out of control a steeper exponential trend will occur.
So the straight line can get steeper when testing gets better then flattens out from old cases not suddenly adding to the count any more. We saw this in Europe and now the U.S. is about to jump from better counting, then it will flatten out. A good government response might make it flatten more. Only China and S. Korea have been able to reach this 3rd stage. Another possibility for a 3rd or 4th stage is to overwhelm hospitals like it did in Wuhan at first. Besides making cases jump, it made deaths jump. Iran has jumped straight to this last stage. The actual cases there will be more than China in a few days. The deaths are probably a lot more than they are counting. Before it hits South America this summer (their winter), the large majority of deaths might be in Iran due to their slow and pitiful response..
If everything were perfect and the same in all countries, deaths would follow the same trend as cases except with a delay. so the starting point for them is an exponential too. European countries look like they might have a sustained 25% increase per day in cases. Iran is in the 35%/day range. Countries doing a really good job are less than about 10%/day. Countries with warm weather are not hardly having an increase. A conservative approach is to use a 20% per day assumption for cold areas.
If it really stays at 25% per day and a country (or the world) has k cases, then the future cases "days' later is cases = k*1.25^(days) I excluded China because their cases jumping up so much before they responded throws things off. Death rate will presumably be the same equation with a different starting k. With an accurate starting k value for a given country, the death rate per case (be it 0.5% good or 3% bad) will not change anything (it's already baked into the starting k value).
If actual cases right now outside China and Iran are excluded, and there are 2x more cases than reported and it's 20% growth, then a conservative estimate is that by April 15 (42 days) is
cases = 2 11700 1.20^42 = 50 million cases (~500,000 deaths) That's what the math tells me, but I find it hard to believe.
The equation in my chart shows an easy way of estimating where things are headed without having to rely on a program to give you the exponential equation. If it were a perfect exponential, the dots would be a horizontal line. Instead they are a straightish line going up on this log plot, indicating an exponential of an exponential. Hopefully that's from testing being late to come online and it will drop back down some. The chart is only reported cases given the current trend,
The 50 M cases (outside China and Iran) by April 15 will hopefully change dramatically over the coming days. I'm watching Western Europe (excluding Italy) to see what might happen in the U.S.. Hopefully the jump to the right is a testing increase which hopefully will flatten out.
Having countries like the following with 40% increases today is well over a billion people by April 15 if that continued. UK, Switzerland, Sweden, Norway Netherland, Iceland. It's interesting and expected that they are the highest increases and pretty cold places.
Thanks @zawy12. The rate of change might not stay at 25% for long. So therefore the exponential function isn't as bad. Countries like Italy ,Iran, or even Spain are putting this to a test so we'll all have to see.
There (hopefully) will be a dampening of the growth rate when the cases 'begin'. China grew by +20% for only about 13 days, for example.
Spain's case growth starts growing by 200% on 2/25 and is still very high (average 81% growth every day for 8 days) so that disproves my theory for the time being.
Iran's growth rate also doesn't appear to be going down and is averaging a 78% increase every day for last 13 days. The effect of licking walls? Wish that was a joke.
Italy:
Korea:
US:
I'm no virologist but it seems the 'true' r0 of COVID-19 would help explain some of what we see with these high growth rates. Here in the US, we had our heads in the sand and permitted the asymptomatic 'spread' of the virus for up to 6 weeks so that might explain it.
This is the case where knowledge really can save lives. Tell everyone to not touch your face as much, wash their hands, use sanitizer if they can't wash, and use .5% Sodium Hypoclorite wipes or similar on surfaces.
Even if nobody around you is doing it. Nobody wants these graphs to be 'right' (despite how fun they are to create). :-)
Your rates are based on new cases per total. I think it's better to use my equation for say the past 7 days to get a rolling daily average increase. For example for Korea: (5186/977)^(1/7) = 1.27 (27% per day).
The newest data is (5621/1262)^(1/7) = 1.24
So south Korea trend is like this: It looks like you can see when they were not testing much and then started catching up on testing. These are 7-day trends.
China provinces except for Wuhan
Western Europe
Wow, try analyzing just non-China confirmed data and there's a very dark trend.
This is a rolling average of the last 20 days of 'new confirmed' % change each day (no China):
It seems China's overwhelming ability to restrict movement is making all the other data look better. Hopefully this trend reverses.
@bluef00ted I have been keeping a spreadsheet that has non-China data separate. Have a look here if you like. https://docs.google.com/spreadsheets/d/1EWXTl-pd4NZyRXb7VVyMieIKp0sYMSTTx0p-OKvG_7o/edit?usp=sharing
Also have a look at non-China data, and death % based on confirmed cases 5 and 10 days prior.
I think this is the best way to view it. It shows that while the increase is dramatic, the exponential trend "folds over" as countries catch up with containment. The 2nd increase in everyone is due to Italy losing control and all the tourists returning home and it showing up precisely a week later.
The USA is just very poorly organized and people do not have the discipline to be effective.. This could not be helped. very informative graphs btw.
Without wishing to denigrate any of the work, and with the utmost care to not offend... I think you guys are doing amazing work. However...
I cannot help but feel that the way in which the death rate is being calculated is wrong. Let me explain. As I type, USA has 122 cumulative incidences. Recoveries = 8, deaths = 7. The mortality rate is calculated as 7/122 = 5.74%. And so the death rate from one nation to another The problem with this is that the mortality is being measured against a base of recovered + dead + existing. And we have no idea how long an incident was in the numbers. Quick versus long recoveries. similarly for deaths. Only "recovered" and "dead" have run the full course of the disease. So surely a more meaningful calculation would be the ratio between "dead" and "recovered"? After all, those are the only incidents which have completed the full cycle? It is the manner in which the authorities measure "regular flu" as 0.1% deadly. The same guys are talking about CoVid19 being (variously) 2% to 3.4%. From the Chinese data we would then have 47,404 / (47,404 + 2,945) = 94.15% recovery rate (to be positive) or the reciprocal 5.85% dead. (at present the Chinese mortality rate is calculated as 3.67%)
Cheers Mark