Closed vnicolici closed 3 years ago
Dear @vnicolici, thanks for bringing this to my attention. I hesitate to comment further without looking into the numbers for Romania more closely. I hope, as you do, that the mismatch between official and excess deaths is smaller this time around and the estimates correct, but appreciate your concern. I think this might be resolved as more data comes in -- right now, the model might be responding in part to an apparent drop in cases in recent days (that may very well be illusory). Generally, the most recent days are among the hardest to predict. For instance, the most recent testing data is from September 21st, quite some time ago, and that is an important predictor -- in its absence, the model will be tend to be more conservative, I would think (and Romania's predicted death rate is as-is very high).
The delays in reporting is certainly a real problem - this is part of what this estimation problem so difficult generally (as this happens in other countries too). In principle, the models should adapt accordingly - optimizing their objective even with such noisy data - but they won't be right in every instance. In this case, dropping a few months of data would probably not make any difference, I suspect, as the model would not place much weight on these in any event (it is trained on all countries).
All that said, I will see if there is anything particularly off about Romania's data/prediction.
Hi again @vnicolici -- appears this issue was due to some incompleteness in the most recent data for Romania -- estimates are now that this wave has surpassed that of November 2020 (which was before vaccinations began in the country). Of course, there is still quite a bit of uncertainty, especially for the most recent days. What I will do is however close this issue, if you don't mind, as estimates appear reasonable. Feel free to reopen this at any time if you can think of any way to improve on the estimates, or still think they are unlikely to be correct. Also reopen/open a new issue if you can think of a way to improve these estimates.
Just letting you know that I am once again looking into this -- my suspicion is some issue in the incoming data to calculate distance-weighted averages
Some ACM data for September came out for Romania and is now available on https://github.com/akarlinsky/world_mortality. Looks like this time the peak COVID deaths won't be so heavily undercounted (hopefully) as they were in the previous two waves, but it's gonna be about a month before we have October data.
While I understand that this is just an estimation, the recent numbers for Romania, for the current wave, seem far from reality.
Historically, over the entire pandemic, the excess deaths in Romania have been on average about 2-3 times higher than the officially reported COVID deaths at the peak of the waves.
Here in Romania we are right now at the peak of the worst wave yet, according to official numbers, but your estimations for excess deaths in this wave don't multiply the official COVID deaths by a similar factor of 2 to 3. Even the worst case estimate is just slightly over the official numbers.
While, as a Romanian, I hope the numbers are indeed that low, I find that very unlikely.
For example, the official numbers for the last 7 days have been 2.156 deaths per 100.000 per day in Romania, on average. Your estimation for daily deaths, shown on The Economist page when you hover over Romania, is now between 0.93 and 2.4. That doesn't seem to make any sense. Especially the lower estimate, which is more than 2 times under the official numbers, is clearly wrong.
I think what breaks your model is that in Romania many COVID deaths from late 2020 and early 2021 were reported officially very late, in June and July 2021, after it was discovered they were missed in previous daily reports by the authorities. The data you use probably doesn't differentiate between deaths that actually happened in June and July 2021, and those reported in June and July 2021 from earlier in the pandemic.
So, for the summer of 2021, the number of excess deaths in Romania was much lower than the number of reported COVID deaths, because the most of the reported COVID deaths were actually from late 2020 and early 2021. I think that's why your model is "reluctant" to estimate a larger number of excess deaths for the present, and for the last few months in general.
In those two months, June and July 2021, the daily official reports did actually mention how many of the newly reported deaths were recent, and how many were deaths from past months. They even grouped the new deaths by the month of occurence in the daily reports. But I'm guessing the data source you use just added both recent and older deaths together, by reporting day, ignoring that most of them were from the past, not from June and July, which caused this problem.
Taking that into account, you should probably ignore the statistics for June and July 2021 when computing the model for Romania, since most of those reported deaths happened a long time in the past. Not sure if that's practical or not, but it seems to be the simplest solution to this problem.
Alternatively, if you need it, I could try to aggregate the information from the official daily death reports in those 2 months in a more detailed form, with counts grouped by month of actual occurence instead of a single number for each day, but not sure if it's worth it.