ehrlinger / ggRandomForests

Graphical analysis of random forests with the randomForestSRC, randomForest and ggplot2 packages.
146 stars 29 forks source link

st.labs #25

Open subasish opened 9 years ago

subasish commented 9 years ago

The package and the vignette codes are working perfectly except st.labs involved codes.

plot(gg_md, lbls=st.labs) Error in plot.gg_minimal_depth(gg_md, lbls = st.labs) : object 'st.labs' not found

22csnyder commented 8 years ago

I noticed this as well

22csnyder commented 8 years ago

Oh I just found this line in one of the vignettes. It's in the code RandomForestSRC-Survival.Rnw but not in the pdf

dta.labs <- data.frame(cbind(names = colnames(pbc), label = labels, type = cls))

Put the "years" variable on top.

dta.labs <- rbind(dta.labs[nrow(dta.labs),], dta.labs[-nrow(dta.labs),]) st.labs <- as.character(dta.labs$label)

Honestly I haven't gotten it to work, but I'm a complete beginner. 2 days new at R

ehrlinger commented 8 years ago

Right. I define st.labs in the code in the vignette, but not out where everyone can see it.

The new rmarkdown release has a "hide code" option that looks nice for the html docs, but not pdflatex... so that's not the way to go about it. I may just remove the "nice" labels that st.labs provides from the document all together, because they are confusing.

Still thinking about the correct approach... in the mean time, here is the st.labs definition you'll need to reproduce the plot:

## Not displayed ##
## Set modes correctly. For binary variables: transform to logical
## Check for range of 0, 1
## There is probably a better way to do this.
for(ind in 1:dim(pbc)[2]){
  if(!is.factor(pbc[, ind])){
    if(length(unique(pbc[which(!is.na(pbc[, ind])), ind]))<= 2) {
      if(sum(range(pbc[, ind], na.rm = TRUE) ==  c(0, 1)) ==  2){
        pbc[, ind] <- as.logical(pbc[, ind])
        }
  }
 }else{
  if(length(unique(pbc[which(!is.na(pbc[, ind])), ind]))<= 2) {
   if(sum(sort(unique(pbc[, ind])) ==  c(0, 1)) ==  2){
    pbc[, ind] <- as.logical(pbc[, ind])
   }
   if(sum(sort(unique(pbc[, ind])) ==  c(FALSE, TRUE)) ==  2){
    pbc[, ind] <- as.logical(pbc[, ind])
   }
  }
 }
 if(!is.logical(pbc[, ind]) & 
    length(unique(pbc[which(!is.na(pbc[, ind])), ind]))<= 5) {
  pbc[, ind] <- factor(pbc[, ind])
 }
}
# Convert age to years
pbc$age <- pbc$age/364.24
pbc$years <- pbc$days/364.24
pbc <- pbc %>% select(-days)
pbc$treatment <- as.numeric(pbc$treatment)
pbc$treatment[which(pbc$treatment == 1)] <- "DPCA"
pbc$treatment[which(pbc$treatment == 2)] <- "placebo"
pbc$treatment <- factor(pbc$treatment)

cls <- sapply(pbc, class) 
labels <- c("Event (F = censor, T = death)", 
            "Treament (DPCA, Placebo)", 
            "Age (years)", 
            "Female = T", 
            "Presence of Asictes", 
            "Presence of Hepatomegaly", 
            "Presence of Spiders", 
            "Edema (0, 0.5, 1)", 
            "Serum Bilirubin (mg/dl)", 
            "Serum Cholesterol (mg/dl)", 
            "Albumin (gm/dl)", 
            "Urine Copper (ug/day)", 
            "Alkaline Phosphatase (U/liter)", 
            "SGOT (U/ml)", 
            "Triglicerides (mg/dl)", 
            "Platelets per cubic ml/1000", 
            "Prothrombin time (sec)", 
            "Histologic Stage", 
            "Time (years)")

dta.labs <- data.frame(cbind(names = colnames(pbc), label = labels, type = cls))
# Put the "years" variable on top.
dta.labs <- rbind(dta.labs[nrow(dta.labs),], dta.labs[-nrow(dta.labs),])

st.labs <- as.character(dta.labs$label)
names(st.labs) <- rownames(dta.labs)

This section in the vignette dos a lot of data prep work. I might need to do this outside of the vignette, and store the updated data set for distribution with the ggRandomForests package.