corinne-riddell / BlackWhiteMortalityGap

1 stars 3 forks source link

Error in the california data, CVD, 1972 #191

Open corinne-riddell opened 7 years ago

corinne-riddell commented 7 years ago

Hi @jkaufman94 @sbh4th, and cc @kathryntmorrison:

For some reason, deaths due to CVD in California Males in 1972 are much lower than contiguous years and as far as I can tell it must be a coding error in the original data. You can see the blip in male life expectancy gap here:

image

And when I go to the raw data, you can see that for white males the crude mortality rate for COD for whites (teal line) is lower in 1972 than surrounding years, with no similar affect for blacks (pink line):

image

At the moment, I've included this data and it seems to only affect the calculation of life expectancy for white males in 1972, the BW LE gap for males in 1972 and the contribution of CVD to the LE gap in 1972. It doesn't have a material effect on the COD decomposition for the other causes, but does have an effect on the % contribution of other causes (which is to be expected since the LE gap increases in 1972 because of this supposed decrease in CVD related mortality in white men).

Do you agree with me removing these data (i.e., CVD related deaths in white males in 1972) from the analysis and not reporting white male LE in 1972, the BW gap, or any of the decompositions for males in 1972?

sbh4th commented 7 years ago

The only thing I can think of to explain this is sampling error, which occurred in only this year but I have no idea why it might have affected CA men in particular, and white men at that. NCHS mortality reports routinely add this note when they talk about random variation of mortality rates:

“The mortality data presented in this report, with the exception of data for 1972, are not subject to sampling error. In 1972, mortality data were based on a 50% sample of deaths because of resource constraints” https://www.cdc.gov/nchs/data/nvsr/nvsr65/nvsr65_04.pdf https://www.cdc.gov/nchs/data/nvsr/nvsr65/nvsr65_04.pdf

For this reason I think it’s okay to remove this year if we want, or we can leave it in and have at least a vaguely plausible explanation for why things may look odd in this year. I’m okay either way, but it will need to be explained regardless.

Thanks for letting us know!

Sam

On Feb 1, 2017, at 7:43 AM, Corinne Riddell notifications@github.com wrote:

Hi @jkaufman94 https://github.com/jkaufman94 @sbh4th https://github.com/sbh4th, and cc @kathryntmorrison https://github.com/kathryntmorrison:

For some reason, deaths due to CVD in California Males in 1972 are much lower than contiguous years and as far as I can tell it must be a coding error in the original data. You can see the blip in male life expectancy gap here:

https://cloud.githubusercontent.com/assets/12090129/22512566/6d6d10e6-e867-11e6-925d-709d661251e6.png And when I go to the raw data, you can see that for white males the crude mortality rate for COD for whites (teal line) is lower in 1972 than surrounding years, with no similar affect for blacks (pink line):

https://cloud.githubusercontent.com/assets/12090129/22513307/d626c6b6-e869-11e6-94dc-2a18e4725533.png At the moment, I've included this data and it seems to only affect the calculation of life expectancy for white males in 1972, the BW LE gap for males in 1972 and the contribution of CVD to the LE gap in 1972. It doesn't have a material effect on the COD decomposition for the other causes, but does have an effect on the % contribution of other causes (which is to be expected since the LE gap increases in 1972 because of this supposed decrease in CVD related mortality in white men).

Do you agree with me removing these data (i.e., CVD related deaths in white males in 1972) from the analysis and not reporting white male LE in 1972, the BW gap, or any of the decompositions for males in 1972?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corinne-riddell/BlackWhiteMortalityGap/issues/191, or mute the thread https://github.com/notifications/unsubscribe-auth/AQMMLqE6THSEx7HrJWWZd2soiVh9a2uKks5rYKgigaJpZM4L0BHu.

jkaufman94 commented 7 years ago

Previous analogous example attached. - JK

From: Corinne Riddell [mailto:notifications@github.com] Sent: Wednesday, February 01, 2017 10:44 AM To: corinne-riddell/BlackWhiteMortalityGap BlackWhiteMortalityGap@noreply.github.com Cc: Jay Kaufman, Dr. jay.kaufman@mcgill.ca; Mention mention@noreply.github.com Subject: [corinne-riddell/BlackWhiteMortalityGap] Error in the california data, CVD, 1972 (#191)

Hi @jkaufman94https://github.com/jkaufman94 @sbh4thhttps://github.com/sbh4th, and cc @kathryntmorrisonhttps://github.com/kathryntmorrison:

For some reason, deaths due to CVD in California Males in 1972 are much lower than contiguous years and as far as I can tell it must be a coding error in the original data. You can see the blip in male life expectancy gap here:

[image]https://cloud.githubusercontent.com/assets/12090129/22512566/6d6d10e6-e867-11e6-925d-709d661251e6.png

And when I go to the raw data, you can see that for white males the crude mortality rate for COD for whites (teal line) is lower in 1972 than surrounding years, with no similar affect for blacks (pink line):

[image]https://cloud.githubusercontent.com/assets/12090129/22513307/d626c6b6-e869-11e6-94dc-2a18e4725533.png

At the moment, I've included this data and it seems to only affect the calculation of life expectancy for white males in 1972, the BW LE gap for males in 1972 and the contribution of CVD to the LE gap in 1972. It doesn't have a material effect on the COD decomposition for the other causes, but does have an effect on the % contribution of other causes (which is to be expected since the LE gap increases in 1972 because of this supposed decrease in CVD related mortality in white men).

Do you agree with me removing these data (i.e., CVD related deaths in white males in 1972) from the analysis and not reporting white male LE in 1972, the BW gap, or any of the decompositions for males in 1972?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/corinne-riddell/BlackWhiteMortalityGap/issues/191, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVMWOnnWEr39FQk4SY_I4pUJCRhrdSzzks5rYKgigaJpZM4L0BHu.

jkaufman94 commented 7 years ago

Why would random sampling error be systematically lower in EVERY age stratum? I think it is a clerical error, like the one I just sent. For that reason, for the point estimate I would rather interpolate between the adjacent years. The problem is that this will mess up the variances for any calculation of CIs. We would have to multiply impute this data point (using the adjacent years as the predictors) to get the right CIs. - JK

Jay S. Kaufman, Ph.D Professor and Canada Research Chair in Health Disparities Department of Epidemiology, Biostatistics, and Occupational Health McGill University 1020 Pine Ave West Montreal, Quebec H3A1A2 CANADA

email: jay.kaufman@mcgill.camailto:jay.kaufman@mcgill.ca website: www.jayskaufman.comhttp://www.jayskaufman.com/ Phone: 514-398-7341 FAX: 514-398-4503

From: Sam Harper [mailto:notifications@github.com] Sent: Wednesday, February 01, 2017 11:33 AM To: corinne-riddell/BlackWhiteMortalityGap BlackWhiteMortalityGap@noreply.github.com Cc: Jay Kaufman, Dr. jay.kaufman@mcgill.ca; Mention mention@noreply.github.com Subject: Re: [corinne-riddell/BlackWhiteMortalityGap] Error in the california data, CVD, 1972 (#191)

The only thing I can think of to explain this is sampling error, which occurred in only this year but I have no idea why it might have affected CA men in particular, and white men at that. NCHS mortality reports routinely add this note when they talk about random variation of mortality rates:

“The mortality data presented in this report, with the exception of data for 1972, are not subject to sampling error. In 1972, mortality data were based on a 50% sample of deaths because of resource constraints” https://www.cdc.gov/nchs/data/nvsr/nvsr65/nvsr65_04.pdf https://www.cdc.gov/nchs/data/nvsr/nvsr65/nvsr65_04.pdf

For this reason I think it’s okay to remove this year if we want, or we can leave it in and have at least a vaguely plausible explanation for why things may look odd in this year. I’m okay either way, but it will need to be explained regardless.

Thanks for letting us know!

Sam

On Feb 1, 2017, at 7:43 AM, Corinne Riddell notifications@github.com<mailto:notifications@github.com> wrote:

Hi @jkaufman94 https://github.com/jkaufman94 @sbh4th https://github.com/sbh4th, and cc @kathryntmorrison https://github.com/kathryntmorrison:

For some reason, deaths due to CVD in California Males in 1972 are much lower than contiguous years and as far as I can tell it must be a coding error in the original data. You can see the blip in male life expectancy gap here:

https://cloud.githubusercontent.com/assets/12090129/22512566/6d6d10e6-e867-11e6-925d-709d661251e6.png And when I go to the raw data, you can see that for white males the crude mortality rate for COD for whites (teal line) is lower in 1972 than surrounding years, with no similar affect for blacks (pink line):

https://cloud.githubusercontent.com/assets/12090129/22513307/d626c6b6-e869-11e6-94dc-2a18e4725533.png At the moment, I've included this data and it seems to only affect the calculation of life expectancy for white males in 1972, the BW LE gap for males in 1972 and the contribution of CVD to the LE gap in 1972. It doesn't have a material effect on the COD decomposition for the other causes, but does have an effect on the % contribution of other causes (which is to be expected since the LE gap increases in 1972 because of this supposed decrease in CVD related mortality in white men).

Do you agree with me removing these data (i.e., CVD related deaths in white males in 1972) from the analysis and not reporting white male LE in 1972, the BW gap, or any of the decompositions for males in 1972?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corinne-riddell/BlackWhiteMortalityGap/issues/191, or mute the thread https://github.com/notifications/unsubscribe-auth/AQMMLqE6THSEx7HrJWWZd2soiVh9a2uKks5rYKgigaJpZM4L0BHu.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/corinne-riddell/BlackWhiteMortalityGap/issues/191#issuecomment-276706663, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVMWOqXkbmXKxCJjM3ocaFlSyCKlxzGPks5rYLPJgaJpZM4L0BHu.

sbh4th commented 7 years ago

I didn't see any attachment in the example you sent, Jay, but I agree it is very weird and likely some kind of error. I suppose we could try and write to someone at NCHS to see if this is a known issue, but we are probably not likely to get a quick response. Multiple imputation for this single point seems like overkill to me. I'd prefer to just include it and note that we think it is likely an error. I'd hate for this kind of anomaly to distract readers from the overall story we end up telling, which will probably have nothing to do with California in 1972.

corinne-riddell commented 7 years ago

JK's attachment for completeness: image

Hmmm, with this type of clerical error in mind, it could be that the number of deaths for some ICD code within the CVD category was copied from another ICD code within California or the same ICD code from another state. I could do some detective work on this, if necessary, but will table it for now.