ddsjoberg / gtsummary

Presentation-Ready Data Summary and Analytic Result Tables
http://www.danieldsjoberg.com/gtsummary
Other
1.06k stars 128 forks source link

search for data set in parent environment to get labels #141

Closed ddsjoberg closed 5 years ago

ddsjoberg commented 5 years ago

In coxph and survreg models, the variable labels are stripped with model.frame(). We do have access to the model call, however. From the call, we can extract the data name and evaluate the name in the parent environment (which in most instances will have the data set available).

fit = coxph(Surv(ttdeath, death) ~ grade + age, trial)
fit$call %>% as.list() %>% purrr::pluck("data") %>% {eval(., parent.frame())}
mljaniczek commented 5 years ago
ddsjoberg commented 5 years ago

Hey @margarethannum and @michaelcurry1123 !

I made this small update to the code where the parent frame is searched for the data set and tries to grab variable labels from there. (I don't understand environments that well.) This is an issue for coxph and survreg models where labels are stripped from factor variables when we use model.frame() on the model object.

The code enhancement was simple enough. I just added this:

  # more  var labels -----------------------------------------------------------
  # labels can be passed via 'label', extracted from model.frame(),
  # the code below as seeks to grab the labels directly from the dataset in the
  # parent frame
  labels_parent_frame = tryCatch({
    fit %>%
      purrr::chuck("call") %>%
      as.list() %>%
      purrr::chuck("data") %>%
      {eval(., parent.frame())} %>%
      purrr::imap(~attr(.x, "label"))
  }, warning = function(w) {
    NULL
  }, error = function(e) {
    NULL
  })

This works great, if both the model and tbl_regression() are called in the global environment. But fails if wrapped in functions.

  1. This is great if you're calling these functions from the global env
  2. But the output is now somewhat unpredictable depending on where it's called from.

Because of the unpredictable output, do you think this change should not be made?

library(survival)

# fitting cox model in global environment
fit = coxph(Surv(ttdeath, death) ~ trt + age, trial)

# fitting cox model, wrapped in another function
wrapped_fun1 <- function(my_data) {
  coxph(Surv(ttdeath, death) ~ trt + age, my_data)
}
fit2 = wrapped_fun1(trial)

# fitting cox model, wrapped in another function
# with tbl_regression in function as well
wrapped_fun2 <- function(my_data) {
  coxph(Surv(ttdeath, death) ~ trt + age, my_data) %>%
    tbl_regression()
}

# finds label
tbl_regression(fit)
# does not find label
tbl_regression(fit2)
# does not find label
wrapped_fun2(trial)

Code is in this branch https://github.com/ddsjoberg/gtsummary/tree/141-find_labels

ddsjoberg commented 5 years ago

How the heck does a model.frame() even work? It's so confusing! :)

michaelcurry1123 commented 5 years ago

are you referring to the variable label or the labels of the levels of the factor? This works fine for me:

library(dplyr) library(survival) library(gtsummary)

lung2 <- mutate(lung, sex = factor(sex, levels = c(1,2),labels = c("Male","Female")))

mod1 <- coxph(Surv(time,status)~age+sex,data = lung2)

fun1 <- function(x){ coxph(Surv(time,status)~ age + sex , data = x) %>% gtsummary::tbl_regression() }

fun1(lung2)

On Sat, May 25, 2019 at 5:14 PM Daniel Sjoberg notifications@github.com wrote:

How the heck does a model.frame() even work? It's so confusing! :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/141?email_source=notifications&email_token=AHICYL2OPWBMPMOEV4DBK2LPXGT2HA5CNFSM4HMHGB6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHZJOY#issuecomment-495948987, or mute the thread https://github.com/notifications/unsubscribe-auth/AHICYLZW3MNLERHD6NR7MATPXGT2HANCNFSM4HMHGB6A .

michaelcurry1123 commented 5 years ago

it is possible that I am misunderstanding the issue.

On Tue, May 28, 2019 at 4:52 PM Michael Curry michaelcurry1123@gmail.com wrote:

are you referring to the variable label or the labels of the levels of the factor? This works fine for me:

library(dplyr) library(survival) library(gtsummary)

lung2 <- mutate(lung, sex = factor(sex, levels = c(1,2),labels = c("Male","Female")))

mod1 <- coxph(Surv(time,status)~age+sex,data = lung2)

fun1 <- function(x){ coxph(Surv(time,status)~ age + sex , data = x) %>% gtsummary::tbl_regression() }

fun1(lung2)

On Sat, May 25, 2019 at 5:14 PM Daniel Sjoberg notifications@github.com wrote:

How the heck does a model.frame() even work? It's so confusing! :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/141?email_source=notifications&email_token=AHICYL2OPWBMPMOEV4DBK2LPXGT2HA5CNFSM4HMHGB6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHZJOY#issuecomment-495948987, or mute the thread https://github.com/notifications/unsubscribe-auth/AHICYLZW3MNLERHD6NR7MATPXGT2HANCNFSM4HMHGB6A .

ddsjoberg commented 5 years ago

Trying to grab the factor variable labels (not level labels). Does that code work for grabbing variable labels?

michaelcurry1123 commented 5 years ago

It looks like if you use Hmisc to label it will carry over for coxph. Happy to chat about it!

library(dplyr) library(survival) library(gtsummary) library(Hmisc) library(labelled)

lung2 <- mutate(lung, sex = factor(sex, levels = c(1,2),labels = c("Male","Female"))) var_label(lung2$sex) <- "Gender" Hmisc::label(lung2$age) <- "AGE"

xx <- model.frame(coxph(Surv(time,status)~age+sex,data = lung2))

varlab <- var_label(xx)

On Tue, May 28, 2019 at 9:33 PM Daniel Sjoberg notifications@github.com wrote:

Trying to grab the factor variable labels (not level labels). Does that code work for grabbing variable labels?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/141?email_source=notifications&email_token=AHICYL5UWFXXWJ575FOW32DPXXMONA5CNFSM4HMHGB6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWN42KA#issuecomment-496749864, or mute the thread https://github.com/notifications/unsubscribe-auth/AHICYL6KNEKQQC2MMANXUC3PXXMONANCNFSM4HMHGB6A .

michaelcurry1123 commented 5 years ago

oh Never mind

On Wed, May 29, 2019 at 9:58 AM Michael Curry michaelcurry1123@gmail.com wrote:

It looks like if you use Hmisc to label it will carry over for coxph. Happy to chat about it!

library(dplyr) library(survival) library(gtsummary) library(Hmisc) library(labelled)

lung2 <- mutate(lung, sex = factor(sex, levels = c(1,2),labels = c("Male","Female"))) var_label(lung2$sex) <- "Gender" Hmisc::label(lung2$age) <- "AGE"

xx <- model.frame(coxph(Surv(time,status)~age+sex,data = lung2))

varlab <- var_label(xx)

On Tue, May 28, 2019 at 9:33 PM Daniel Sjoberg notifications@github.com wrote:

Trying to grab the factor variable labels (not level labels). Does that code work for grabbing variable labels?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/141?email_source=notifications&email_token=AHICYL5UWFXXWJ575FOW32DPXXMONA5CNFSM4HMHGB6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWN42KA#issuecomment-496749864, or mute the thread https://github.com/notifications/unsubscribe-auth/AHICYL6KNEKQQC2MMANXUC3PXXMONANCNFSM4HMHGB6A .

ddsjoberg commented 5 years ago

OK! Per @michaelcurry1123 suggestion, this code snippet works in all cases. Cox models' variable labels will always be found going forward! YAY!

labels_parent_frame = tryCatch({
    stats::model.frame.default(fit) %>%
      purrr::imap(~attr(.x, "label"))
  }, warning = function(w) {
    NULL
  }, error = function(e) {
    NULL
  })
michaelcurry1123 commented 5 years ago

I checked it. and it works

ddsjoberg commented 5 years ago

Awesome, thank you! Can you "approve" the review on the PR page? https://github.com/ddsjoberg/gtsummary/pull/157