NikNakk / forestmodel

41 stars 13 forks source link

Using variable labels instead of variable names when available #24

Open larmarange opened 4 years ago

larmarange commented 4 years ago

Variable labels, stored as a label attributes and easily accessible with labelled::var_label(), are becoming quite common. Many packages (like gtsummary) producing graphs or tables are now adopting the following rule: if defined, use variable labels instead of variable names.

Such addition to forestmodel would allow to easily customize the names of variables displayed on forest plots.

ShixiangWang commented 4 years ago

This package is not in active development, if you are interested in this feature, please implement it, then keep a fork or create a pull request to https://github.com/ShixiangWang/forestmodel

larmarange commented 4 years ago

@ShixiangWang is it an official fork?

@NikNakk could you clarify if you still plan to maintain and develop forestmodel?

ShixiangWang commented 4 years ago

@larmarange Nope, I don't say that. The author is nice, but he may be not active in GitHub, from my view.

NikNakk commented 4 years ago

Hi @larmarange, @ShixiangWang,

I've not been very active in maintaining this package for a while because of being busy with other things, but I'm still aiming to get to the outstanding queries that have been raised including yours. There's also now a more pressing reason to attend to the package because it's erroring on CRAN so will be delisted if I don't fix that. I'll at least fix the current issue that would lead to delisting in the next few days, but if I can I'll try to fix any other outstanding issues and improvements.

larmarange commented 4 years ago

Thanks @NikNakk for your feedback.

Regarding the proposed improvement, it should not be very difficult to implement once identified where variable names are taken into account.

I didn't have time to get into your code in details so I do not know yet how your code was organized. But as you are familiar with your package, you should have an idea on where to look at.

Best regards

NikNakk commented 4 years ago

@larmarange I've made a new branch that has a simple implementation of this at https://github.com/NikNakk/forestmodel/tree/labels. You can test it using remotes::install_github("NikNakk/forestmodel@labels")

larmarange commented 4 years ago

Thanks a lot

NikNakk commented 4 years ago

@larmarange please let me know when you've had a chance to test this out.

larmarange commented 4 years ago

@NikNakk I have done some quick tests. It works well with simple models. Thanks.

When I add interaction terms, labels are not taken into account for interaction terms, but it was already the case before (it seems that forstmodel was not treating them in a particular way).

library(questionr)
library(forestmodel)
library(labelled)

data(fertility)
women <- unlabelled(women)
mod <- glm(employed ~ age + residency * instruction, data = women, family = binomial())
forest_model(mod, exponentiate = TRUE)

image

Here a quick example with gtsummary to show this package handle interaction terams.

library(gtsummary)
tbl_regression(mod)
Characteristic log(OR) 95% CI p-value
Age at last anniversary (in years) 0.06 0.05, 0.07 <0.001
Urban / rural residency
urban
rural 0.28 0.00, 0.55 0.052
Level of instruction
none
primary 0.35 -0.02, 0.74 0.067
secondary -0.83 -1.2, -0.50 <0.001
higher -0.71 -1.3, -0.10 0.022
Urban / rural residency * Level of instruction
rural * primary -0.16 -0.67, 0.35 0.5
rural * secondary 0.19 -0.41, 0.80 0.5
rural * higher -1.5 -4.5, 0.56 0.2
larmarange commented 4 years ago

But I know that managing interaction terms could be tricky and beyond the current issue.

Otherwise, it's perfect. Thanks a lot

NikNakk commented 4 years ago

I’ll have a look at interaction terms when I get a chance. gtsummary looks like a good starting point. For now I’ve merged the labels branch into master and need to get the latest version on CRAN because otherwise it will be delisted.

larmarange commented 4 years ago

Thanks

NikNakk commented 4 years ago

FYI, this version is now on CRAN.

proshano commented 3 years ago

Variable labels still not showing up

corneliushennch commented 3 years ago

Variable labels still not showing up

Same here, it works fine with gtsummary::tbl_regression but not with forest_model from the forestmodel package that I just downloaded from Github.

NikNakk commented 3 years ago

Sorry for the delayed response, @proshano and @corneliushennch. Could you please give me some example code that doesn't work as expected? I'm still planning to work on interaction terms since they're not currently properly supported with or without labels.

larmarange commented 3 years ago

In case it could be useful for you, gtsummary::tbl_regression() now relies on broom.helpers package: https://larmarange.github.io/broom.helpers/

corneliushennch commented 3 years ago

EDIT:

The problem occurs with factor and character variables when using coxph(). Only the label of the numeric variable gets printed as you can see in the reprex. All variable types work fine if you use other models (just checked glm). So changing factors back to character – which would already be tedious as factors are pretty standard in this kind of data analysis – doesn't solve it, as I first thought. I'd very much appreciate if you could implement the proper use of labels also for the coxph objects, as there is so far no convenient function that can display Hazard ratios in clear forest plots with labels. I formerly used survminer::ggforest, but switched to forestmodel in order to be able to use labels...

library(survival)
library(dplyr)
library(forestmodel)

surv_data <- tibble(
  time = abs(rnorm(300, 50, 30)),
  event = sample(c(0,1), 300, prob = c(0.8, 0.2), replace = TRUE),
  gender = sample(c(0,1), 300, prob = c(0.6, 0.4), replace = TRUE),
  rx = sample(c("no","yes"), 300, prob = c(0.5, 0.5), replace = TRUE),
  gene = sample(c(0,1), 300, prob = c(0.9, 0.1), replace = TRUE)
)

surv_data <- surv_data %>% 
  mutate(gender = factor(gender, levels = c(0,1), labels = c("female", "male")))

labelled::var_label(surv_data) <- list(
  gender = "Gender (f/m)", #this variable is a factor -> doesn't work!
  rx = "Irradiation", # character -> label doesn't work!
  gene = "Gene of Interest" # numeric -> label works...
)

labelled::var_label(surv_data) # checking that labels are assigned
#> $time
#> NULL
#> 
#> $event
#> NULL
#> 
#> $gender
#> [1] "Gender (f/m)"
#> 
#> $rx
#> [1] "Irradiation"
#> 
#> $gene
#> [1] "Gene of Interest"
lapply(surv_data, class) # showing variable classes
#> $time
#> [1] "numeric"
#> 
#> $event
#> [1] "numeric"
#> 
#> $gender
#> [1] "factor"
#> 
#> $rx
#> [1] "character"
#> 
#> $gene
#> [1] "numeric"

# printing the coxph model -> only label of numeric variable works
print(forest_model(coxph(formula = Surv(time, event) ~ gender + rx + 
                           gene, data = surv_data)))

# ok it seems to be a specific problem of the coxph object -> labels get printed correctly 
# with glm...
mod <- glm(gender ~ gene + rx, data = surv_data, family = binomial())
forest_model(mod, exponentiate = TRUE)

Created on 2021-04-23 by the reprex package (v0.3.0)

fabones1 commented 3 years ago

I would also love to have the coxph factor label bug fixed as it would save a lot of time in my work.

mhmdrahouma commented 1 month ago

Thanks a lot for all your efforts. Please let us know if the label bug got fixed for coxph as it would save us a lot of time. I tried it but it didn't work. It works perfectly with glm only. Appreciate your help and advice.