aphalo / ggpmisc

R package ggpmisc is an extension to ggplot2 and the Grammar of Graphics
https://docs.r4photobiology.info/ggpmisc
94 stars 6 forks source link

Add blog post with examples of `ggpairs()` used with 'ggpmisc' and 'ggpp' #51

Closed Bendlexane closed 4 months ago

Bendlexane commented 5 months ago

Hi!

Thank you for the outstanding work on ggpmisc!

Over the past few days, I've been attempting, albeit unsuccessfully, to implement ggpmisc's stat_ma_eq and stat_ma_line within the upper plots of ggpairs. My objective is to display equations grouped by class in the upper plots with proper spacing, and scatterplots with fitted moving average (MA) lines (or any other type) at the bottom.

While I've managed to successfully integrate the lines, I've encountered challenges in creating a custom function for the upper plots. Specifically, I haven't found a method to ensure the proper display of equation overlap or the names of the corresponding taxa. Any guidance on resolving this issue would be greatly appreciated.

Perhaps a similar approach could be applied to other functions within ggpmisc.

Below is the code I've produced:

library(GGally)
library(ggplot2)
library(ggpmisc)

fun_name <– function(data, columns, grp_labels, color, ...) {
  log_data <- function(data) {
    data <- data %>% 
      mutate(across(where(is.numeric), ~ {log10(.)
      }))
    return(data)
  }

  upperfun <- function(data, mapping, grp.label){
    ggplot(data = data, mapping = mapping, grp.label=grp.label, color=color) +
      ggpmisc::stat_ma_eq(aes(label = after_stat(paste("bold(", {{grp.label}}, "*\":\")~~", 
                                                       eq.label, sep = ""))),
                          parse = TRUE) +
      geom_blank() +
      facet_wrap(~{{grp.label}}, ncol = 1, 
                 strip.position = "top")+ #I tried to pass them as facet to avoid overlap, but works poorly
      theme_void()+
      theme(strip.background = element_blank(),
            strip.text = element_blank(),
            panel.grid = element_blank())
  }

  lowerfun <- function(data, mapping){
    ggplot(data = data, mapping = mapping) +
      geom_point() +
      stat_ellipse(type = "norm") +
      ggpmisc::stat_ma_line(show.legend = T, se = FALSE, method = "lmodel2:MA") +
      theme(panel.grid = element_blank())
  }

  data <- log_data(data)

  data %>%
    ggpairs(columns = columns,
            ggplot2::aes(colour = {{color}}, alpha = 0.4),
            upper = list(continuous = wrap(upperfun, grp.label = grp_labels, ...)),
            lower = list(continuous = wrap(lowerfun, ...)),
            progress = FALSE) +
    theme_minimal(base_size = 9) +
    theme(panel.grid = element_blank())
}

Thank you

Best wishes M.

aphalo commented 5 months ago

Hi! I do not normally use ggpairs(), so I find it difficult to understand the code in your example, some of which, seems not related to the MA equations. ggpairs() is of course very useful, and compatibility with ggpmisc important. So, many thanks for the report!

Could you please provide a reproducible example, maybe an editted version of one of the examples in the ggpairs() help page, or a simpler example that actually produces a plot with overlapping labels. Something that I can actually run and that is as simple as possible and still demonstrates the problem (rather than several nested functions, if possible).

Have you tried building the plot you are trying to insert into the ggpairs plot on its own? Parameters vstep and hstep control the displacement for succesive groups. A mapping to a factor or discrete variable with aes() is needed so that a grouping is created (but this most likely happens within ggpairs()) as the positions of the labels are computed based on groups (if needed you can use the group aesthetic that only creates an "invisible" grouping).

Another possibility is that the 'ggplot2' 3.5.0 update may have broken some code. So, please, let me also know which versions of R, 'ggplot2' , 'ggpmisc' and 'ggpp' you are currently using. One bug that I fixed in 'ggpmisc' 0.5.4 could be behind the problem you see if you are using an earlier version.

Best wishes,

Pedro.

Bendlexane commented 5 months ago

Hi!

Thank you for you fast response!

Yes, I can provide the information you asked. Sorry if I haven't done it previously...

This is my session info:

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] smatr_3.4-8        ggpmisc_0.5.5      ggpp_0.5.6         GGally_2.2.1       lubridate_1.9.3   
 [6] forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5       
[11] tidyr_1.3.1        tibble_3.2.1       tidyverse_2.0.0    RColorBrewer_1.1-3 ggrepel_0.9.5     
[16] ggplot2_3.5.0      mclust_6.1        

loaded via a namespace (and not attached):
  [1] polynom_1.4-1      sandwich_3.1-0     conflicted_1.2.0   rlang_1.1.3        magrittr_2.0.3    
  [6] multcomp_1.4-25    compiler_4.3.2     mgcv_1.9-1         vctrs_0.6.5        quantreg_5.97     
 [11] pkgconfig_2.0.3    shape_1.4.6.1      crayon_1.5.2       fastmap_1.1.1      backports_1.4.1   
 [16] labeling_0.4.3     utf8_1.2.4         rmarkdown_2.26     tzdb_0.4.0         nloptr_2.0.3      
 [21] MatrixModels_0.5-3 xfun_0.42          glmnet_4.1-8       jomo_2.7-6         cachem_1.0.8      
 [26] jsonlite_1.8.8     progress_1.2.3     highr_0.10         pan_1.9            prettyunits_1.2.0 
 [31] broom_1.0.5        R6_2.5.1           stringi_1.8.3      car_3.1-2          boot_1.3-30       
 [36] rpart_4.1.23       estimability_1.5   Rcpp_1.0.12        iterators_1.0.14   knitr_1.45        
 [41] zoo_1.8-12         pacman_0.5.1       Matrix_1.6-5       splines_4.3.2      nnet_7.3-19       
 [46] timechange_0.3.0   tidyselect_1.2.0   rstudioapi_0.15.0  abind_1.4-5        yaml_2.3.8        
 [51] lmodel2_1.7-3      codetools_0.2-19   lattice_0.22-5     plyr_1.8.9         withr_3.0.0       
 [56] coda_0.19-4.1      evaluate_0.23      survival_3.5-8     ggstats_0.5.1      confintr_1.0.2    
 [61] pillar_1.9.0       ggpubr_0.6.0       carData_3.0-5      mice_3.16.0        DT_0.32           
 [66] foreach_1.5.2      insight_0.19.8     plotly_4.10.4      generics_0.1.3     hms_1.1.3         
 [71] munsell_0.5.0      scales_1.3.0       minqa_1.2.6        xtable_1.8-4       glue_1.7.0        
 [76] emmeans_1.10.0     lazyeval_0.2.2     tools_4.3.2        data.table_1.15.2  lme4_1.1-35.1     
 [81] SparseM_1.81       ggsignif_0.6.4     distill_1.6        mvtnorm_1.2-4      cowplot_1.1.3     
 [86] grid_4.3.2         datawizard_0.9.1   colorspace_2.1-0   nlme_3.1-164       cli_3.6.2         
 [91] fansi_1.0.6        ggthemes_5.1.0     viridisLite_0.4.2  downlit_0.4.3      gtable_0.3.4      
 [96] rstatix_0.7.2      digest_0.6.34      TH.data_1.1-2      htmlwidgets_1.6.4  farver_2.1.1      
[101] memoise_2.0.1      htmltools_0.5.7    allomr_0.3.0       lifecycle_1.0.4    httr_1.4.7        
[106] mitml_0.4-5        MASS_7.3-60.0.1  

Let's use an unpacked version of the code I've produced for plotting and the iris dataset as MRE:

data(iris)

require(tidyverse)
require(GGally)
require(ggpmisc)

#logtrasforming data for allometry
log_data <- function(data) {
  data <- data %>% 
    mutate(across(where(is.numeric), ~ {
      min_nonzero <- min(.[. > 0], na.rm = TRUE)
      adjusted_zero_value <- ifelse(min_nonzero > 0, min_nonzero / 10, 1e-6)
      . <- ifelse(. == 0, adjusted_zero_value, .)
      log10(.)
    }))
  return(data)
}

#logtrasfroming iris
iris_log <- log_data(iris)

#upperfunction for ggpairs with ggminsc equations
upperfun <- function(data, mapping){
  ggplot(data = data, mapping = mapping) +
    geom_blank() +
    ggpmisc::stat_ma_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
                          formula = y ~ x, 
                          parse = TRUE)
}

#lowerfunction for ggpairs with ma lines
lowerfun <- function(data, mapping){
  ggplot(data = data, mapping = mapping) +
    geom_point() +
    stat_ellipse(type = "norm") +
    ggpmisc::stat_ma_line(show.legend =  F, se = FALSE, method = "lmodel2:SMA") +
}

# plotting using ggpairs
iris_log %>%
  ggpairs(columns = 1:4,
          ggplot2::aes(colour = Species, alpha = 0.4),
          upper = list(continuous = wrap(upperfun)),
          lower = list(continuous = wrap(lowerfun)),
          progress = FALSE) +
  theme_minimal(base_size = 9) +
  theme(panel.grid = element_blank())

As you see, the plot does not show the species names and they are on one side a little bit crowded. This problem worsen when there are 5 or more variables. I've removed facet_wrap and other things to semplify a little bit the code.

Hope this code can be useful.

Thank you in advance!

Best regards Manuel

aphalo commented 5 months ago

First problem, the overlaps. The three definitions below work, unless you want to assemble a complicated label, the top one is the simplest approach. The last one, it is best avoided, as the maintainers of 'ggplot2' have decided to change the notation and the ... approach could stop working in the future. vstep = 0.2 means for each label, advance 20% of the height of the panel. In small panels the default is too small. Another alternative would be to make the equations smaller ading size = 2 or some other value of your choice.

#upperfunction for ggpairs with ggminsc equations
upperfun <- function(data, mapping){
  ggplot(data = data, mapping = mapping) +
    geom_blank() +
    ggpmisc::stat_ma_eq(use_label(c("eq", "r2")),
                        vstep = 0.2, 
                        formula = y ~ x, 
                        parse = TRUE)
}

#upperfunction for ggpairs with ggminsc equations
upperfun <- function(data, mapping){
  ggplot(data = data, mapping = mapping) +
    geom_blank() +
    ggpmisc::stat_ma_eq(aes(label = paste(after_stat(eq.label), after_stat(rr.label), sep = "~~~")), 
                        vstep = 0.2,
                        formula = y ~ x, 
                        parse = TRUE)
}

#upperfunction for ggpairs with ggminsc equations
upperfun <- function(data, mapping){
  ggplot(data = data, mapping = mapping) +
    geom_blank() +
    ggpmisc::stat_ma_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
                        vstep = 0.2,
                        formula = y ~ x, 
                        parse = TRUE)
}
aphalo commented 5 months ago

The label works as long as there is enough space. grp.label works like an aesthetic. You need to map a variable to it within the call to aes(), in this case within the call to ggpairs(). You have to play with the final rendered size of the plot and the size of the text used for the equations. The values in the example below worked well on the screen of my computer, but when creating the bipmap below, the text resulted too big for the size of the panels. I am using future version 0.5.7, but this should also work with 0.5.6.

require(tidyverse)
#> Loading required package: tidyverse
require(GGally)
#> Loading required package: GGally
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2
require(ggpmisc)
#> Loading required package: ggpmisc
#> Loading required package: ggpp
#> Registered S3 methods overwritten by 'ggpp':
#>   method                  from   
#>   heightDetails.titleGrob ggplot2
#>   widthDetails.titleGrob  ggplot2
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate
#> Registered S3 method overwritten by 'ggpmisc':
#>   method                  from   
#>   as.character.polynomial polynom

data(iris)

#logtrasforming data for allometry
log_data <- function(data) {
  data <- data %>% 
    mutate(across(where(is.numeric), ~ {
      min_nonzero <- min(.[. > 0], na.rm = TRUE)
      adjusted_zero_value <- ifelse(min_nonzero > 0, min_nonzero / 10, 1e-6)
      . <- ifelse(. == 0, adjusted_zero_value, .)
      log10(.)
    }))
  return(data)
}

#logtrasfroming iris
iris_log <- log_data(iris)

#upperfunction for ggpairs with ggminsc equations
upperfun <- function(data, mapping){
  ggplot(data = data, mapping = mapping) +
    geom_blank() +
    ggpmisc::stat_ma_eq(use_label(c("grp", "eq", "r2")),
                        vstep = 0.15, 
                        size = 3,
                        formula = y ~ x)
}

#lowerfunction for ggpairs with ma lines
lowerfun <- function(data, mapping){
  ggplot(data = data, mapping = mapping) +
    geom_point() +
    stat_ellipse(type = "norm") +
    ggpmisc::stat_ma_line(show.legend =  F, se = FALSE, method = "lmodel2:SMA")
}

# plotting using ggpairs
iris_log %>%
  ggpairs(columns = 1:4,
          ggplot2::aes(colour = Species, alpha = 0.4, grp.label = Species),
          upper = list(continuous = wrap(upperfun)),
          lower = list(continuous = wrap(lowerfun)),
          progress = FALSE) +
  theme_minimal(base_size = 9) +
  theme(panel.grid = element_blank())

Created on 2024-03-13 with reprex v2.1.0

aphalo commented 5 months ago
Bendlexane commented 5 months ago

Hi Pedro!

Thank you a lot for you outstanding job! Now it works beautifully!

I'm feeling silly to admit but maybe I'm missing something, which blog are you talking about?

M.

aphalo commented 5 months ago

Hi Manuel! The task is a note to myself. I have a blog/website named Using R for Photobiology, and among other things I publish in the blog examples that cannot be easily included in the packages' built-in documentation. I still have to add links to the blog in the packages' documentation, so it is not surprising that you did not know what I was writing about. Also at Using R for Photobiology one of the menues gives access to the documentation of my packages as web pages with the output from examples. (My field of research is also in plant science, mostly physiological plant ecology and photobiology. I keep a separate blog at my university, where I post occasionally.) Best wishes, Pedro.

Bendlexane commented 5 months ago

Oh, amazing! I didn't know that!

As a plant enthusiast, I will definitely check out your blog!

I think that now everything has been sorted out, so we can close the issue.

Thank you again! Best wishes M.

aphalo commented 5 months ago

Hi, I changed the title, and reopen the issue to remind me about documenting this in a blog post.

aphalo commented 4 months ago

Done. Web page