Closed nicholasjhorton closed 9 years ago
in addition, it would be helpful to map the 51 "Unknown" females to be missing.
While we're on the topic, is there a reason why this is declared as a factor? I suspect most people would find it simpler as a character variable. Is this a standard for mosaicData and related packages?
Earlier Nick said:
I'd prefer to leave the Unknown's separate from the NA's.
Have you changed your mind?
NHANES records the value as missing for males. On advantage of this is that you get a Yes/No/(Unknown) tally for females without first having to subset. That's a small matter. More generally, I think a good principle is to stick with the way NHANES does things (with sensible renaming of arbitrary numeric codes) unless there is a compelling reason to be different. That way, the NHANES documentation will match our data. So recoding the unknowns and assigning males to No would both need sufficient reason to violate that principle.
I'm pretty sure that most (all?) categorical data in mosaicData
is coded with factors. What would be the advantage here of using character instead?
In NHANES
we have
> table(sapply(NHANES, class))
factor integer numeric
31 34 11
This raises a number of pedagogical issues that might be fodder for a JSE paper or JSE datasets and stories submission, since students will need more than simple operations to calculate quantities of interest (such as proportion pregnant in the entire sample). But your amendments to the NHANES example for Smoking helps. I'm happy to close this issue.
Perhaps we can talk a bit about NHANES at CVC and think about such a publication.
not missing: