Model is overestimating

ppaulojr commented 4 years ago

Right now the model is grossly overestimating cases. I've been following it for some day and we are well below the minimum predicted to the next day.

If we don't make accurate predictions for D+1 I don't see the point in extrapolating it to five days.

tapirus07 commented 4 years ago

It has happen because it is naive, if not inaccurate, to try to fit an exponential shape over a logistic curve. Brazil already entered in a logistic growth, so now on all exponential predictions will be overestimations. I have tried a hierarchical mixed models to estimate some tendencies. Take a look if you believe it is useful.

http://movement-wildlife.shinyapps.io/APP-2/

vsudbrack commented 4 years ago

I guess being naive at this point would be assuming the data we have now for Brazil represents the dynamics of the disease while press recurrently addresses issues in testing and analyzing data. We do not believe we've already reached the effects of saturation. Overestimation occurs due to the thousands of tests still waiting to be done. We've published in +Info an analysis of the first 20 days of predictions. Best regards!

tapirus07 commented 4 years ago

Thanks for your considerations Vitor. I appreciate the data and code sharing, as well as the interface of disclosuring. Best!

lazaronixon commented 4 years ago

hello, I am using your data set to calculate http://profetadocorona.herokuapp.com and using this model to calculate logistic growth https://github.com/katanaml/covid19

tapirus07 commented 4 years ago

lazaronixon,

really cool. I did a very similar approach, but it woth to note two important differences.

I used a different equation to model logistic growth, beacuse I tried to link biological meaning to each parameter. I used : Cases ~ c/1+exp((b-time)/a), so in my formulation some quantities are straighfoward to get: "c" is the asymptote, "b" is time of inflection point, and "a" is the velocity. Note b+3*a is time when we reach 95% of the cases.
Because logistic estimation for most countries in exponential growth will be very poor, I ran a unique mixed logistic regression for all countries, including parameters "a", "b" and "c" as random effects. That mean that individual parameter estimations of each country are part of a big populational Gaussian sample with certain estimated mean and variance. Therefore, countries with few initial data would be better estimated by using this shared information.

Anyway, I appreciated a lot your approach and dashboard.

Please, let me know if you have further ideas or inputs. Gustavo.

covid19br / site_antigo

Model is overestimating #7