ProjectMOSAIC / mosaic

Project MOSAIC R package
http://mosaic-web.org/
93 stars 26 forks source link

ejecting data from the mosaic package and putting it into mosaicData #369

Closed rpruim closed 10 years ago

rpruim commented 10 years ago

For the sake of keeping CRAN happy.

rpruim commented 10 years ago

I'm in the process of making sure I have

if (require(mosaicData))

in tall the required places.

Also, I'm putting NHANES in a package all by itself.

rpruim commented 10 years ago

using mosaicData:: in the tests so that they work...

dtkaplan commented 10 years ago

While we're at it, let's change NHANES a bit to delete some of the inessential variables.

Here's an Rda file and the corresponding Rd. I was calling the file NHANEDCF, but it should be renamed simply NHANES.

-Danny

On Thu, Aug 21, 2014 at 11:01 PM, Randall Pruim notifications@github.com wrote:

I'm in the process of making sure I have

if (require(mosaicData))

in tall the required places.

Also, I'm putting NHANES in a package all by itself.

— Reply to this email directly or view it on GitHub https://github.com/rpruim/mosaic/issues/369#issuecomment-53020126.

rpruim commented 10 years ago

I don't see an Rda file. But it would suffice to provide a select() command that drops or keeps the columns desired.

For documentation, it would be better to have an R file with the appropriate roxygen markup. See datasets.R

rpruim commented 10 years ago

Which of these are you proposing we drop?

names(NHANES)
 [1] "seqn"        "sex"         "age"         "pregnant"    "ethnicity"   "death"      
 [7] "followup"    "smoker"      "diabetic"    "height"      "weight"      "waist"      
[13] "wci"         "bmi"         "ptfat"       "tfat"        "lfat"        "llean"      
[19] "lbmi"        "fbmi"        "bbmi"        "pfat"        "bmd"         "fmhm_other" 
[25] "hdl"         "chol"        "bps"         "bpd"         "income"      "zheight"    
[31] "zweight"     "zwaist"      "zwci"        "zbmi"        "zptfat"      "ztfat"      
[37] "zlfat"       "zllean"      "zlbmi"       "zfbmi"       "zbbmi"       "zpfat"      
[43] "zbmd"        "zfmhm_other" "zhdl"        "zchol"       "zbps"        "zbpd"       
[49] "zincome"     "pop_weight"  "psu"         "stratum"     "zwh"  

I'm making a final spin through vignettes, checking and double checking. I want to submit this VERY soon. So I won't wait for NHANES edits if we can't do them quickly.

nicholasjhorton commented 10 years ago

I appreciate your prioritizing this.

Nick

On Aug 22, 2014, at 9:03 AM, Randall Pruim notifications@github.com wrote:

Which of these are you proposing we drop?

names(NHANES) [1] "seqn" "sex" "age" "pregnant" "ethnicity" "death"
[7] "followup" "smoker" "diabetic" "height" "weight" "waist"
[13] "wci" "bmi" "ptfat" "tfat" "lfat" "llean"
[19] "lbmi" "fbmi" "bbmi" "pfat" "bmd" "fmhm_other" [25] "hdl" "chol" "bps" "bpd" "income" "zheight"
[31] "zweight" "zwaist" "zwci" "zbmi" "zptfat" "ztfat"
[37] "zlfat" "zllean" "zlbmi" "zfbmi" "zbbmi" "zpfat"
[43] "zbmd" "zfmhm_other" "zhdl" "zchol" "zbps" "zbpd"
[49] "zincome" "pop_weight" "psu" "stratum" "zwh"

I'm making a final spin through vignettes, checking and double checking. I want to submit this VERY soon. So I won't wait for NHANES edits if we can't do them quickly.

— Reply to this email directly or view it on GitHub.

rpruim commented 10 years ago

I'd like Nick to weigh in on column dropping, since I think he uses this data and we don't want to break things he is relying upon.

nicholasjhorton commented 10 years ago

I have no strong feelings (but I don't use the scored variables).

All the best,

Nick

On Aug 22, 2014, at 9:04 AM, Randall Pruim notifications@github.com wrote:

I'd like Nick to weigh in on column dropping, since I think he uses this data and we don't want to break things he is relying upon.

— Reply to this email directly or view it on GitHub.

rpruim commented 10 years ago

OK. I'll remove the scaled variables. That will make it roughly half the size and easier to work with.

Makes we wonder about adding a zscore function:

zscore <- function( x, na.rm=getOptions("na.rm", FALSE) ) {
  ( x - mean(x, na.rm=na.rm)) / sd(x, na.rm=na.rm)
}

That's easy enough, I think I'll do it.

rpruim commented 10 years ago

zscores have been removed from NHANES.

nicholasjhorton commented 10 years ago

Isn't that what "scale()" does?

Nick

On Aug 22, 2014, at 9:32 AM, Randall Pruim notifications@github.com wrote:

OK. I'll remove the scaled variables. That will make it roughly half the size and easier to work with.

Makes we wonder about adding a zscore function:

zscore <- function( x, na.rm=getOptions("na.rm", FALSE) ) {

( x - mean(x, na.rm=na.rm)) / sd(x, na.rm=na.rm) } That's easy enough, I think I'll do it.

— Reply to this email directly or view it on GitHub.