akarlinsky / world_mortality

World Mortality Dataset: international data on all-cause mortality.
MIT License
282 stars 56 forks source link

Puerto Rico data #11

Closed st2048 closed 2 years ago

st2048 commented 3 years ago

Hello,

First of all, thank you very much for all the work you have been doing. I have been following your work for the past few months and I really appreciate your contribution to the understanding of the pandemic.

I saw that the source you use for the US from 2017 onwards also contains data for Puerto Rico. I have seen that you have been including data from several dependencies and I thought you may be interested in including this as well.

My understanding is that for the US you use the data from the 'United States' rows in that source. I have double checked and the values for 'United States' equal the sum of the values of all 50 states and the District of Columbia, excluding dependencies, so the data for Puerto Rico is not being already included within the US data.

As a note, I can see that there is also data from Puerto Rico in the source used by STMF, which my understanding is the one you use for the US for 2015-16. However, in this case when I tried adding the data from all 50 states and DC this did not exactly match the values for 'United States' within that source; I also tried adding Puerto Rico but the values do not match either. So I am not 100% sure here.

I hope this may be useful, and once again thank you for all your work.

dkobak commented 3 years ago

Thanks for raising this issue.

Why do all these CDC files contain information on Puerto Rico but do not contain information on other non-incorporated US territories, e.g. Guam or US Virgin Islands? Or did I miss them?

meepbobeep commented 3 years ago

I went and looked at one of the recent CDC National Vital Statistics Reports, and while U.S. Virgin Islands, American Samoa, Northern Marianas, and Guam were listed in some of the tables: https://www.cdc.gov/nchs/data/nvsr/nvsr69/nvsr69-13-tables-508.pdf

I noticed that U.S. VI & American Samoa were totally missing data. That's odd. To be sure, they have small populations, but if they could collect the Northern Marianas data, you'd think they could collect the other.

dkobak commented 3 years ago

As a note, I can see that there is also data from Puerto Rico in the source used by STMF, which my understanding is the one you use for the US for 2015-16. However, in this case when I tried adding the data from all 50 states and DC this did not exactly match the values for 'United States' within that source; I also tried adding Puerto Rico but the values do not match either. So I am not 100% sure here.

This is really weird btw. This table also includes NYC as a separate jurisdiction, so NYC is probably double counted if one adds up NY state and NYC.

I checked the values for unweighted 2015 week 1 25-44 years. The value for United States is 2412. The sum of all other values (including 30 for Puerto Rico and 55 for NYC) is 2374. How can this be?

We may want to email CDC to clarify.

meepbobeep commented 3 years ago

The CDC data from 2020 and after (and maybe before - I didn't check where you're getting the data from) splits NYC off from the rest of NY state - it's not double-counting in the data sets I've used in my own work. For the CDC's excess mortality dashboard here: https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm - you'll see that the NYC count alone exceeds the "New York" entry when the wave hit (in the figure notes they put "Data for New York excludes New York City.")

They split NYC off from the rest of the state because the city itself was so hard hit by COVID. I think it (or New Jersey) still ranks highest for excess mortality, just from that first wave of COVID deaths.

dkobak commented 3 years ago

OK thanks this clarified the NY/NYC issue.

But why are the values for "United States" not equal to the sum across states in https://data.cdc.gov/api/views/y5bj-9g5w/rows.csv (but are equal in https://data.cdc.gov/api/views/xkkf-xrst/rows.csv) remains a mystery.

st2048 commented 3 years ago

This is really weird btw. This table also includes NYC as a separate jurisdiction, so NYC is probably double counted if one adds up NY state and NYC.

I checked the values for unweighted 2015 week 1 25-44 years. The value for United States is 2412. The sum of all other values (including 30 for Puerto Rico and 55 for NYC) is 2374. How can this be?

We may want to email CDC to clarify.

This is weird.

I had originally looked at all ages combined, and for unweighted 2015 week 1 I got 61,685 for sum of 50 states (including NYC) and DC, 62,322 for sum of 50 states (including NYC), DC, and Puerto Rico, and 61,873 for 'United States'. I have quickly checked in my pivot table and the pattern remains the same for all weeks of 2015, with the value of 'United States' being always higher than the value for the sum of 50 states (including NYC) and DC, but lower than the value for the sum of 50 states (including NYC), DC and Puerto Rico. This is why I had thought that the data from Puerto Rico was not included within 'United States'.

But you are right - if you look at the 25-44 age range only then the value of 'United States' is higher than the sum of 50 states (including NYC), DC and Puerto Rico, and this remains true for all weeks of 2015.

I have no idea why this may be. Potentially the discrepancy between the sum of 50 states and the value of 'United States' could have to do with deaths of non-residents being only added to the total but this is just a guess. If you get a response from the CDC I would be interested in knowing what they say.

Regarding your earlier question, I could not find any data from other US territories and I do not know why they were not included.

dkobak commented 3 years ago

I have just written to the CDC. I will post here if they reply.

dkobak commented 2 years ago

Unfortunately CDC never replied. Just re-sent them the same email.

But whatever the answer is, the question remains (@akarlinsky) regarding whether we want to include Puerto Rico separately into WMD or not.

dkobak commented 2 years ago

Puerto Rico has been added, so closing this issue.