ggobi / ggally

R package that extends ggplot2
http://ggobi.github.io/ggally/
588 stars 119 forks source link

In ggcorr, why are 0s replaced with NA? #504

Open winterstat opened 4 months ago

winterstat commented 4 months ago

Hello,

I use the ggcorr function extensively and generally love it, so thank you! However, I recently ran into an issue where I noticed that if correlations are exactly 0, their label is removed from the plot (i.e., that box is empty). See the example below where the correlation between X1 and X2 is 0, and is omitted, while the correlation between X1 and X3 is .001 and is shown as 0 because of the label_round = 1 default:

library(reprex)
library(GGally)
#> Loading required package: ggplot2
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2

cors <- matrix(c(1, 0, .001,
                 0, 1, .2,
                 .001, .2, 1), nrow = 3, byrow = T)

row.names(cors) <- colnames(cors) <- c("X1", "X2", "X3")

ggcorr(data = NULL, cor_matrix = cors, label = T)

Created on 2024-06-18 with reprex v2.0.2

I found this in the ggcorr function file, which is where I think this is being done:

m_long$coefficient[m_long$coefficient == 0] <- NA

Here is a link to the location of this line: https://github.com/ggobi/ggally/blob/9d954c1731d481028f0c6609e7152aef7e526677/R/ggcorr.R#L219C1-L232C52

Would it be possible to add an argument to the ggcorr function that allows users to decide if they want to include exact 0s or not? Showing the zeroes is very important in communicating my results (and it doesn't make sense to tell readers "when you see an empty space, that is actually a zero").

winterstat commented 4 months ago

After trying simply commenting out that one line in the ggcorr function, I now know that 0s are replaced by NA to get the upper triangle of the correlation plot to be empty. As a fix for myself, I tried the following.

#m <- data.frame(m * lower.tri(m))
  # replace above with this: 
  m[upper.tri(m, diag = T)] <- NA
  rownames(m) <- colnames(m)

  # need to make it a dataframe:
  m <- data.frame(m)
  m$.ggally_ggcorr_row_names <- rownames(m)
  # m = reshape::melt(m, id.vars = ".ggally_ggcorr_row_names")
  # names(m) = c("x", "y", "coefficient")
  m_long <- m %>%
    tidyr::pivot_longer(
      cols = -.ggally_ggcorr_row_names,
      names_to = "y",
      values_to = "coefficient"
    ) %>%
    dplyr::rename(x = .ggally_ggcorr_row_names) %>%
    dplyr::mutate(y = factor(y, levels = rownames(m)))

  #m_long$coefficient[m_long$coefficient == 0] <- NA

This works for me, but I'm not a great R programmer, so if someone else has a more elegant solution please use that instead.

Edited to add:

Could I also suggest using label = format(label, nsmall = label_round) in several places in the ggcorr function to ensure that rounding is consistent (e.g., "0.00" shows up as "0.00" not "0")?