Closed homerhanumat closed 8 years ago
The source file for this data set appears to be http://www.stat.berkeley.edu/~statlabs/data/babies23.data
and has the same issue.
> read.file("http://www.stat.berkeley.edu/~statlabs/data/babies23.data") -> G
Reading data with read.table()
> dim(G)
[1] 1236 23
> dim(Gestation)
[1] 1236 23
> tally( ~sex, data = G)
1
1236
@nicholasjhorton , do you know anything more about this data set?
While we are here. We should probably recode the data. I see lots of integer codes. As is, we really aren't doing anything that couldn't be gotten by loading the data from the URL. Only one variable has been converted to a factor, and the conversion is less than ideal:
tally( ~drace, data = Gestation)
## 0 1 10 2 3 4 5 6 7 8 NANA <NA>
## 489 42 10 45 77 50 157 41 250 39 5 31
StatLabs page 198 notes that these dataset are 1,236 male single births where the baby lived at least 28 days. So we might want to note that these are male births (and drop the sex variable). I've done this: see https://github.com/ProjectMOSAIC/mosaicData/commit/cc0c40c1f729052c3bebe6b6a7474c45bdcda861
Perhaps keeping the sex variable is a good thing just to make it abundantly clear that this is a data set including only males. (Updating documentation is also good.)
Hi,
In the man for the Gestation data frame, the variable sex is listed as having two values, but only one (1 for males) is appears in the data itself. Is this intended?