biometrician / abe

An R package for Augmented Backward Elimination
GNU General Public License v3.0
3 stars 0 forks source link

#19: add `plot(abe_object, type.plot = "stability")` #19

Closed biometrician closed 1 year ago

biometrician commented 1 year ago

Concerning the r code you send me with the analysis of the breast cancer data set, where the stability paths are plotted: It would be really nice if this could be put in a plot function.

The y-axis should be the inclusion proportion instead of the %, so that it is on the same scale as alpha.

Adding a diagonal would be nice if alpha is changed to distinguish between random and non-random selection.

biometrician commented 1 year ago

As a suggestion, I would ask Gregor to program this issue.

Should it become one type.plot in the plot.abe function?

rokblagus commented 1 year ago

Yes he can easily do this, I would prefer this to be a new type.plot in plot.abe. Be carful with "Wallisch2021" since there the results need to be based on resampling. Should work automatically when basing the plot on the summary function.

biometrician commented 1 year ago

Gregor, here is my code for the stability paths. The plots are not optimized regarding how they look.

# for one tau

set.seed(4624512)

alphas <- c(0.05, 0.1, 0.157, 0.2, 0.25, 0.5)

stability_boot_abe_path <- 
  abe.resampling(global_model, 
  data = bodyfat, 
  include = c("abdomen", "height"), 
  criterion ="alpha", alpha = alphas,
  tau = 0.05, exp.beta = TRUE,
  type.resampling = "bootstrap",
  num.resamples = 1000)         

var_rel_freqABE <- data.frame(summary(stability_boot_abe_path)$var.rel.frequencies)
var_rel_freqABE[,-1] * 100

# stability path for VIF

data_longABE <- gather(var_rel_freqABE[, -1], variable, rel.freq, factor_key = TRUE)
data_longABE$x <- rep(alphas, ncol(var_rel_freqABE)-1)
#data_long$rel.freq <- data_long$rel.freq * 100

qplot(x, rel.freq, data = data_longABE, geom = c("path"), ylab = "Inclusion frequency",
      xlab = "alpha", colour = variable) +
      theme_bw() + 
      theme(legend.text = element_text(size = 14),
            legend.title = element_text(size = 14, face = "bold"),
            axis.text.x  = element_text(size=12),
            axis.title.x = element_text(face="bold", size=12),
            axis.text.y  = element_text(size=12),
            axis.title.y = element_text(face="bold", size=12)) +
  ylim(0, 1) +
  geom_abline(intercept = 0, slope = 1)

set.seed(462451)

taus <- c(0.025, 0.05, 0.1, 0.15, 0.25, 0.5)

stability_boot_abe_path_tau <- 
  abe.resampling(global_model, 
  data = bodyfat, 
  include = c("abdomen", "height"), 
  criterion ="alpha", alpha = alphas,
  tau = taus, exp.beta = TRUE,
  type.resampling = "bootstrap",
  num.resamples = 1000)         

var_rel_freqABE_tau <- data.frame(summary(stability_boot_abe_path_tau)$var.rel.frequencies)
var_rel_freqABE_tau[,-1] * 100

# stability path for VIF

# for all alphas
data_longABE_tau <- gather(var_rel_freqABE_tau[, -1], variable, rel.freq, factor_key = TRUE)
data_longABE_tau$x <- rep(taus, ncol(var_rel_freqABE_tau)-1)
#data_long$rel.freq <- data_long$rel.freq * 100

qplot(x, rel.freq, data = data_longABE_tau, geom = c("path"), ylab = "Inclusion frequency",
      xlab = "tau", colour = variable) +
      theme_bw() + 
      theme(legend.text = element_text(size = 14),
            legend.title = element_text(size = 14, face = "bold"),
            axis.text.x  = element_text(size=12),
            axis.title.x = element_text(face="bold", size=12),
            axis.text.y  = element_text(size=12),
            axis.title.y = element_text(face="bold", size=12)) +
  ylim(0, 1) 
#  geom_abline(intercept = 0, slope = 1)

# what do you think? alpha legend is missing.
# could show only for specific variables. select = ....

# for alpha = 0.157
data_longABE_tau2 <- gather(var_rel_freqABE_tau[13:18, -1], variable, rel.freq, factor_key = TRUE)
data_longABE_tau2$x <- rep(taus, ncol(var_rel_freqABE_tau[13:18,])-1)

qplot(x, rel.freq, data = data_longABE_tau2, geom = c("path"), ylab = "Inclusion frequency",
      xlab = "tau", colour = variable) +
      theme_bw() + 
      theme(legend.text = element_text(size = 14),
            legend.title = element_text(size = 14, face = "bold"),
            axis.text.x  = element_text(size=12),
            axis.title.x = element_text(face="bold", size=12),
            axis.text.y  = element_text(size=12),
            axis.title.y = element_text(face="bold", size=12)) +
  ylim(0, 1) 
gregorsteiner commented 1 year ago

I implemented the stability plots. There is now the option type.plot = "stability". I added a parameter type.stability which can be set to "alpha" (default) or "tau". This controls whether the inclusion frequency is plotted as a function of alpha or tau. The handling of tau = Inf is not yet ideal, I will continue working on this next week.

rokblagus commented 1 year ago

Gregor, I would just like to thank you for all the work you are doing regarding this project. I really appreciate all your help!

Best,

Rok

From: Gregor Steiner @.> Sent: Thursday, December 15, 2022 11:09 AM To: biometrician/abe @.> Cc: rokblagus @.>; Comment @.> Subject: Re: [biometrician/abe] #19: plot function for stability paths (Issue #19)

I implemented the stability plots. There is now the option type.plot = "stability". I added a parameter type.stability which can be set to "alpha" (default) or "tau". This controls whether the inclusion frequency is plotted as a function of alpha or tau. The handling of tau = Inf is not yet ideal, I will continue working on this next week.

— Reply to this email directly, view it on GitHub https://github.com/biometrician/abe/issues/19#issuecomment-1352830605 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ANT5S56XYMKWD5TZR3FSFJDWNLU2FANCNFSM6AAAAAAR3K2V4U . You are receiving this because you commented.Message ID: @.***>

gregorsteiner commented 1 year ago

Thank you! And no problem, I'm glad I can help :)

biometrician commented 1 year ago

Hi, also great job. Thanks.

  1. Can you please change the x-axis name to a greek alpha or tau. And the y-axis should be called "Inclusion frequencies".

  2. Please, turn around the x-axis for tau, so that it goes from largest to smallest. Then the plots for alpha and tau have the same interpretation.

  3. Just a question: my call included a large number of different taus. The facette with Tau = 100 is before the facette with Tau = 2. All other taus are ordered correctly.

  4. A stability for the combination criterion = "AIC" and various taus should be possible. Currently, I get a warning. The same is true for "BIC" and various taus.

rokblagus commented 1 year ago

Currently the function does not work if I have many taus but a single alpha, unless if I specify type.stability = "tau". Is this really necessary, can't the function automatically figure out that I would like to have a plot by tau? I would have an error if both alpha and tau are scalars, if only one is a vector I would plot a line according to this parameter, but only if both are vectors I would have a default alpha, but this could be changed to that I can also see tau. However, why wouldn't I see what is happening for both in one plot?

gregorsteiner commented 1 year ago

Yes, it would definitely be better if the function could automatically figure out whether to plot by tau or by alpha. And if one is a scalar and the other is a vector, it would be straightforward. However, in the case where both alpha and tau are a vector it is a bit tricky. That's why I added the additional parameter.

I think in the discussion last week we came to the conclusion that having multiple lines for different tau/alpha values in the same plot is not ideal. This looks pretty messy if the number of variables and or tau/alpha values is large.

But I'm happy to change this. @biometrician what do you think?

biometrician commented 1 year ago

However, why wouldn't I see what is happening for both in one plot? I played around with this idea. For a small number of variables, it might work. But as a default option, it is quite likely that one receives a plot with indistinguishable lines all over the place. So one would carfully think about a way for a generalizable version, e.g. with facets or in the form of a loop-plot? In the seminar, we came to the conclusion, that we try to implement basic versions of all plots now. In the future, we can definitely think about extending these visualizations.

rokblagus commented 1 year ago

I agree with all that was said. My main point however was that I get an error if I had a single alpha and many taus, unless I changed the argument type.stability, which I think could easily be fixed.

gregorsteiner commented 1 year ago

Yes, you're right. I'll try to fix this tomorrow

biometrician commented 1 year ago

Gregor, can you check that: When I run this code, I get a Warning regarding unkown parameters. Thanks a lot!

set.seed(462456) alphas <- c(0.01, 0.05, 0.1, 0.157, 0.2, 0.25, 0.5) stability_boot_be_path <- abe.resampling(global_model, data = bodyfat, include = c("abdomen", "height"), criterion ="alpha", alpha = alphas, tau = Inf, type.resampling = "bootstrap", num.resamples = 100)

plot(stability_boot_be_path, type.plot = "stability") Warnung: Ignoring unknown parameters: linewidth