PregnantNow should be "No" for males

nicholasjhorton commented 9 years ago

not missing:

> tally(PregnantNow ~ Gender, data=NHANES)
           Gender
PregnantNow female male
    Yes         72    0
    No        1573    0
    Unknown     51    0
    <NA>      3324 4980

nicholasjhorton commented 9 years ago

in addition, it would be helpful to map the 51 "Unknown" females to be missing.

nicholasjhorton commented 9 years ago

While we're on the topic, is there a reason why this is declared as a factor? I suspect most people would find it simpler as a character variable. Is this a standard for mosaicData and related packages?

rpruim commented 9 years ago

Earlier Nick said:

I'd prefer to leave the Unknown's separate from the NA's.

Have you changed your mind?

NHANES records the value as missing for males. On advantage of this is that you get a Yes/No/(Unknown) tally for females without first having to subset. That's a small matter. More generally, I think a good principle is to stick with the way NHANES does things (with sensible renaming of arbitrary numeric codes) unless there is a compelling reason to be different. That way, the NHANES documentation will match our data. So recoding the unknowns and assigning males to No would both need sufficient reason to violate that principle.

I'm pretty sure that most (all?) categorical data in mosaicData is coded with factors. What would be the advantage here of using character instead?

In NHANES we have

> table(sapply(NHANES, class))

 factor integer numeric 
     31      34      11

nicholasjhorton commented 9 years ago

This raises a number of pedagogical issues that might be fodder for a JSE paper or JSE datasets and stories submission, since students will need more than simple operations to calculate quantities of interest (such as proportion pregnant in the entire sample). But your amendments to the NHANES example for Smoking helps. I'm happy to close this issue.

rpruim commented 9 years ago

Perhaps we can talk a bit about NHANES at CVC and think about such a publication.

ProjectMOSAIC / NHANES

PregnantNow should be "No" for males #2