ProjectMOSAIC / NHANES

R package containing versions of NHANES data
11 stars 6 forks source link

IsPregnant variable #1

Closed nicholasjhorton closed 9 years ago

nicholasjhorton commented 9 years ago

Danny has been using an IsPregnant variable in his version of NHANES, but this doesn't seem to be included in the new version on CRAN. Any guidance as to whether current pregnancy can be added?

rpruim commented 9 years ago

If I'm unerstanding the data correctly, this is pretty easy to do. A few questions:

  1. Any other variables we want? Since we can match on ID, it is easy to add more variables if they are available at the NHANES site. Some double checking on other variables suggests that the matching is correct.
  2. How should we code this? NHANES has 3 categories: 1 = pregnant, 2 = non pregnant, 3 = unknown status. In addition, there are many NAs, even among women. Should we make the 3's NAs as well or leave them as "unknown"? See below.
tally( PregnantNow ~ Gender, data = NHANES )
##             Gender
## PregnantNow  female male
##     Yes         72    0
##     No        1573    0
##     Unknown     51    0
##     <NA>      3324 4980

I've written code to do this that makes it pretty easy to add in any additional variables NHANES provides. The main time will be spent recoding variables and adding to the documentation.

If you want a particular variable, find it at the NHANES site at let me know the details.

rpruim commented 9 years ago

The version on github now has PregnantNow in both NHANES an NHANESraw.

nicholasjhorton commented 9 years ago

This is enormously helpful.

I'd prefer to leave the Unknown's separate from the NA's.

Nick

On Jun 18, 2015, at 10:25 PM, Randall Pruim notifications@github.com wrote:

If I'm unerstanding the data correctly, this is pretty easy to do. A few questions:

• Any other variables we want? Since we can match on ID, it is easy to add more variables if they are available at the NHANES site. Some double checking on other variables suggests that the matching is correct.

• How should we code this? NHANES has 3 categories: 1 = pregnant, 2 = non pregnant, 3 = unknown status. In addition, there are many NAs, even among women. Should we make the 3's NAs as well or leave them as "unknown"? See below.

tally( PregnantNow ~ Gender, data = NHANES )

Gender

PregnantNow female male

Yes 72 0

No 1573 0

Unknown 51 0

3324 4980

I've written code to do this that makes it pretty easy to add in any additional variables NHANES provides. The main time will be spent recoding variables and adding to the documentation.

If you want a particular variable, find it at the NHANES site at let me know the details.

— Reply to this email directly or view it on GitHub.

Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College Box 2239, 31 Quadrangle Dr Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton