akarlinsky / world_mortality

World Mortality Dataset: international data on all-cause mortality.
MIT License
282 stars 56 forks source link

Peru Data #4

Closed maciej-kochanowicz closed 1 year ago

maciej-kochanowicz commented 3 years ago

I wonder if you have tackled the following issue. Your dataset gives following totals for Peru:

This is based on daily deaths as reported by Ministry of Health, that you give as reference (when I check numbers right now I noticed small discrepancies for the total values, but this is a minor issue).

At the same time National Statistical Office provides different totals (https://www.inei.gob.pe/estadisticas/indice-tematico/poblacion-y-vivienda/ table Defunciones registradas por año de inscripción, según departamento):

The Ministry of Health data would even lead to improbable low deaths rates (per 1000 inhabitants) for 2017-2019.

I have already noticed this issue in some other "trackers" of excess mortality due to Covid, but I was unable to analyze it further due to my limited Spanish.

The incompleteness of death registration is probably the reason behind this discrepancy (it is mention in some sources dealing with Peru's vital statistics).

This issue may obviously affect calculations of excess mortality and even lead to exceptionally high results for Peru, that are even higher than its neighbors (Ecuador, Bolivia). It may be possible solved, or alleviated, with some additional extrapolation (you can for example notice, that discrepancy is not equal for all provinces and that it is getting smaller every year, probably due to the better coverage of death registration).

Have you noticed or analyzed this issue?

dkobak commented 3 years ago

We are aware of this issue, but are not sure what to do about it. It's unclear to us where the numbers reported by the National Statistical Office (~150k/year) are coming from.

The coverage of the SINADEF data (that we use) seems to grow: 2017 is definitely undercounted, compared to 2018 and 2019. This yields a positive trend, but luckily for us predicts 2020 baseline that agrees really well with the early pre-pandemic 2020 data.

So the data that we use are self-consistent, but it may very well be that some fraction of deaths is missed. It's unclear what the fraction is... Based on the 150k figure, it would be roughly (150-115)/150 ~ 25%. In any case, there is nothing we can do about it.

Apart from warning the users -- so I suggest we add this as a caveat to README @akarlinsky.

akarlinsky commented 3 years ago

Continuing on what Dmitry said, we are in contact with Vital-Strategies and SINADEF regarding this issue, and have also addressed it in our revised paper on medRxiv .

lsempe77 commented 3 years ago

Hello, I've being following your great work. I've already submited a paper recently addressing specifically this issue in Peru. Under registration is about 30% in case of Peru. Hopefully I would be able to share it in the following next weeks.

Best wishes, Lucas

dkobak commented 3 years ago

30%? Wow. Can you say what this estimate is based on?

akarlinsky commented 3 years ago

Hi Lucas, thank for these kind words. Your paper could be very useful to us - if you can send me a PDF draft (I'm at karlinsky@gmail.com) I would love to take a look and cite it when it becomes public.

lsempe77 commented 3 years ago

30%? Wow. Can you say what this estimate is based on?

Hi, the head of SINADEF gave a talk with that estimation. This can be verified if you look at absolute figures prior 2017 (when there was a paper system, values are higher by 20-25%. Also if you see mortality rates, which would be around 4 and not 6-6.5 per 1000 as they should. There were also a rebelion on one region (LAMBAYEQUE - very low numbers only linked to public insurance reimbursement to regional goverment) in 2018 and 2019.

Additionaly, the systems shows a natural growth from 2017 to 2019. The problem is that when the pandemics started they couldn't finish processsing data. That is why figures from 2019 and 2020 still change right now.

I've shared the paper, Ariel.

lsempe77 commented 3 years ago

See a presentation that summarises a bit of the problems. Made to PAHO a few months ago. Models and methods changed, but diagnostics is similar.

[Framework for measuring Excess of Mortality in LMICs.pdf](https://github.com/akarlinsky/world_mortality/files/6305070/Framework.for.measuring.Excess.of.Mortality.in.LMICs.pdf)

dkobak commented 2 years ago

@akarlinsky As there is nothing we can do about this, should we close the issue?

st2048 commented 2 years ago

Hi,

I've been taking a look into this now that the INEI 2020 figures have been published, and I'd be interested in your thoughts.

Based on my calculations, it seems that SINADEF coverage (out of the total deaths registered by INEI) has increased from 73% in 2019 to 94% in 2020. The INEI reports break down total registered deaths into online and manual registrations, and there seems to have been a rise in the proportion of deaths registered online. There also seem to be some deaths that are registered online that do not appear on the SINADEF public data (not sure the reason why), and SINADEF coverage of online registered deaths also increased.

I've summarised the key figures in the table below:

Year | INEI Registered online | INEI Registered manually | INEI TOTAL registered deaths | INEI registered online / INEI total registered deaths | TOTAL deaths in SINADEF system | SINADEF deaths / INEI online deaths | SINADEF deaths / INEI total registered deaths -- | -- | -- | -- | -- | -- | -- | -- 2017 | 136,108 | 13,924 | 150,032 | 90.72% | 98,974 | 72.72% | 65.97% 2018 | 140,104 | 11,586 | 151,690 | 92.36% | 112,809 | 80.52% | 74.37% 2019 | 146,684 | 10,996 | 157,680 | 93.03% | 114,942 | 78.36% | 72.90% 2020 | 240,078 | 837 | 240,915 | 99.65% | 226,609 | 94.39% | 94.06%

For easy reference, these are the links to the INEI reports: 2020, 2019, 2018, 2017.

Using 2017-2019 linear trend, I get 100k excess deaths in 2020 using SINADEF figures, and 80k using INEI figures. I know you mentioned in the thread above that the 2017-19 SINADEF linear trend seemed to match well with the SINADEF Jan-Feb 2020 data - I believe there have been revisions since then and that now the Jan-Feb 2020 data seems slightly high? Could this be related to SINADEF improved coverage?

I'm not sure what would be the best way to go about this, and of course we don't yet have INEI data for 2021-22. Potentially one option could be to scale up the SINADEF weekly values to match the INEI yearly totals, perhaps using the 2020 ratio of SINADEF deaths to INEI deaths for 2021-22 if we assume that stayed more or less the same. If I remember correctly you did something similar for Australia to adjust deaths of natural causes to all deaths before all cause weekly mortality became available?

Would be interested in your thoughts on this.

akarlinsky commented 2 years ago

Thank you, this is very interesting. I do find it a bit strange that the "total INEI registered" represents almost 100% complete death registration (when compared to expected number of deaths as per WPP and similar, see here https://www.medrxiv.org/content/10.1101/2021.08.12.21261978v1).

I have been thinking about this issue myself and I do think that 2020 probably represents an upward shift in registration completeness, and I explored the consequences of accounting for it here: https://twitter.com/ArielKarlinsky/status/1484818098813386754

Indeed, we can adjust the 2017-2019 counts to total expected/INE (as they are very similar) and treat 2020 onward as 100% complete (or 95% complete). Excess will still be massive but the p-score for example much smaller (still largest in the world I think, but no longer twice as high as next one). I'm still a bit unsure this is enough evidence of an upward shift in registration in 2020, but leaving it as is might be a worse mistake.

Any thoughts? @lsempe77 @dkobak

st2048 commented 2 years ago

That's very interesting, and thanks for sharing the paper.

Just to add that I'm not familiar with how civil registration works in Peru, but just on the basis of reading the INEI reports it does seem quite clear to me that those figures are registered deaths, and not estimates. For example, there are tables breaking down deaths both by place of registration and place of usual residence of the diseased, as well as by the way in which they were registered (online vs manual; and ordinary vs police vs judiciary), which all add up to the total figure and would not make sense if these numbers were estimates. The text also consistently talks about registrations. Or am I missing something?

And if we assume the WPP and other estimates of total deaths are correct, then could it be that death registration has been almost complete in Peru, at least for the past few years, and that the SINADEF numbers are the ones that are incomplete?

In terms of the change from 2019 to 2020, what the INEI data seems to show is an increase of SINADEF coverage relative to total registered deaths - then of course whether total registered deaths increased or decreased in completeness relative to actual deaths is a separate question, but I guess there is not much we can do about that in the absence of other information?

dkobak commented 2 years ago

Thanks for continuing this discussion. When did the 2020 INEI report become available and when should we expect the 2021 INEI report?

The current Peru data in WMD look like this:

Peru

You are right that our baseline for Jan-Mar 2020 looks too low now. Can somebody plot how these data would look like if we multiply each year by the INEI/SINADEF ratio, and use the 2020 ratio for 2021 and 2022?

akarlinsky commented 2 years ago

image

dkobak commented 2 years ago

That actually looks VERY good/plausible, and better than what we have now.

st2048 commented 2 years ago

Thanks for continuing this discussion. When did the 2020 INEI report become available and when should we expect the 2021 INEI report?

The 2020 report is dated January 2022, so I guess the 2021 report should be available by early 2023.

Thank you for producing the graphs - the fit seems ok, though I believe the Jan-Mar 2020 data may end up slightly below the baseline once it is calculated? In a way I guess this should not be surprising - after all, the ratios we are using for the adjustments are based on the numbers for the whole year, but of course SINADEF converge is likely to have improved gradually, as opposed to a sudden jump from 73% on the last week of 2019 to 94% on the first week of 2020. That being said, in the absence of weekly INEI data I cannot think of any better solution.

Just an additional caveat - I was looking at the final section of the reports and saw that the number of districts not reporting significantly increased in 2020. If you go to section 5.6 of the 2020 report, it says that 345 distracts did not report data on births, deaths and marriages. The comparable figures are 33 in 2019, 47 in 2018, and 46 in 2017. A quick google search suggests there are 1875 districts in Peru, so the 2020 report seems to have missed data from around 18% of districts, compared to 2-3% in previous years.

Now my question is whether this affects both online and manual registrations, or manual only. As manual registrations comprise only a small proportion of the total, this would likely not have a significant effect, but if this affects online registrations as well then the effect could be very significant. Unfortunately I couldn't find any explicit confirmation of this thus far, but there are some hints that this may affect manual registrations only. Below the table of districts not reporting there is a note that says 'this refers to districts that did not report to RENIEC the vital events registered in their jurisdiction' (page 69 in 2020 report). Now section 5.3 (page 60 of 2020 report) says the following:

Information sources:

  • Statistical chart of vital events and modifying acts of civil status (CEHVAMEC) , sent monthly to RENIEC by the Offices of the Civil Registry-OREC that function in the municipalities and native communities of the country (manual registration)
  • Information of vital events registered online. This refers to registrations conducted in registry offices (OR), auxiliary registry offices (ORA) and in certain affiliate offices of registration of civil status (OREC) that still function in the municipalities and conduct online registrations.

So would this mean that the RENIEC reports are only relevant for manual registrations? Any thoughts on this?

dkobak commented 2 years ago

@st2048 From what you wrote above, it seems plausible that "not reporting districts" only concern manual registrations. The table you posted earlier also looks as if the manual registrations column in 2020 is suddenly missing ~10k deaths. If so, the true number of deaths in 2020 may be ~250k rather than 240k.

However, as this is only an educated guess, I would suggest that we use the IMEI data as reported.

akarlinsky commented 2 years ago

Updated on https://github.com/akarlinsky/world_mortality/commit/7dbbd0a238d476a0f954b80477876cf92742ba6a.

Keeping the issue open for now if there are any additional comments, will close in about 1 week if none will come up.

@st2048 if you'd like to be credited for this we would love to.

lsempe77 commented 2 years ago

Apologies for the delay in answering. A few things from the data 2020. The cutoff date was June 14th, 2021 (165 days on 2021). 7297 registered deaths correspond to 2018 and 2019 (about 3.1% of the total number of registered deaths). This is the first time that INEI reports that information. That would suggest that SINADEF completeness is about 97%. I find it difficult to believe such as great performance in a very stressed health system.

lsempe77 commented 2 years ago

@st2048 From what you wrote above, it seems plausible that "not reporting districts" only concern manual registrations. The table you posted earlier also looks as if the manual registrations column in 2020 is suddenly missing ~10k deaths. If so, the true number of deaths in 2020 may be ~250k rather than 240k.

However, as this is only an educated guess, I would suggest that we use the IMEI data as reported.

I think that guess is not necessarily correct. Peru has 90,000 villages with less than 500 inhabitants in it. It corresponds to almost 3 million people (~8% of the population). People do not necessarily register their deaths, either manually or electronically.

st2048 commented 1 year ago

It seems that INEI has released some 2021 data - see here under 'Defunciones registradas por año de inscripción, según departamento', showing 269,349 deaths in 2021. This doesn't seem far off from the 264,087 currently estimated by WMD.

Data from previous years in that table matched exactly with the INEI reports, though 2021 data is marked as preliminary so might change. I'd expect the 2021 INEI report to be published relatively soon.

dkobak commented 1 year ago

@st2048 Has the 2021 INEI report already been published?

st2048 commented 1 year ago

@st2048 Has the 2021 INEI report already been published?

Doesn't look like it's been published yet, or at least I haven't been able to find it so far

dkobak commented 1 year ago

Thanks. I think this issue can be closed for now, as there is nothing for us to do here (currently).