geom_signif - all comparisons dissappear when one comparison has missings

MPietzke commented 1 year ago

Initially posting it at ggpubr (https://github.com/kassambara/ggpubr/issues/503) however this is just a shameless wrapper for geom_signif() - so maybe it's better suited here!?

When using geom_signif() to make multiple comparisons it works fine, until one of the comparisons cannot be performed (e.g. due to too many missings). In this case also all the possible comparisons dissappear! Please see this example:

# A dataset with some NAs 
dataset = tibble(
  "Sample" = rep(c("Sample1", "Sample2"), each = 15),
  "Cond"   = rep(c("A", "B", "C",
                   "A", "B", "C"), each = 5),
  "Rep"    = rep(1:5, 6),
  "Value"  = c(runif(5, 10, 12),  #A1
               runif(5, 11, 14),  #B1
               runif(5, 10, 13),  #C1
               runif(5, 10, 12),  #A2
               runif(5, 11, 14),  #B2
               c(runif(2, 10, 13), NA, NA, NA) #C2
  ))

# With min 2 datapoints we see all the comparisons we want to have!
ggplot(dataset, 
       aes(x = Cond, y = Value, 
           colour = as.factor(Cond),
           fill = as.factor(Cond) )) + 
  geom_jitter(size = 5, width = 0.2, alpha = 0.3, stroke = 1.5,
              shape = 21) + 
  stat_summary(fun.min = mean, fun.max = mean, size = 1.5,                
               geom='errorbar') + 
  facet_wrap( ~ Sample) +
  theme_bw()  + scale_y_continuous(limits = c(0, 16)) +
  geom_signif(comparisons = list(c("A", "B"),
                                 c("B", "C")),
              step_increase = 0.2,
              colour = "black") + 
  theme(legend.position = "none")

with only NAs in one of the conditions (C), the other comparisons (A-B) dissappers as well!

ggplot(data = filter(dataset, Rep >= 3), 
       aes(x = Cond, y = Value, 
           colour = as.factor(Cond),
           fill = as.factor(Cond) )) + 
  geom_jitter(size = 5, width = 0.2, alpha = 0.3, stroke = 1.5,
              shape = 21) + 
  stat_summary(fun.min = mean, fun.max = mean, size = 1,                
               geom='errorbar') + 
  facet_wrap( ~ Sample) +
  theme_bw()  + scale_y_continuous(limits = c(0, 16)) +
  geom_signif(comparisons = list(c("A", "B"),
                                 c("B", "C")),
              step_increase = 0.2,
              colour = "black") + 
  theme(legend.position = "none")

Here also the comparison A-B get lost, even though this can still be calculated. One could adapt the comparisons made (after seeing it's not working in one of the cases) but in general I want to have a consistent picture over multiple (usually more than just 2) Samples .

It throws a warning, so at least the function allready know something fails: 1: Removed 3 rows containing non-finite values (stat_summary). 2: Removed 3 rows containing non-finite values (stat_signif). 3: Computation failed in stat_signif(): not enough 'y' observations.

Would it be possible to:

check (e.g. after the warning) which of the comparisons cannot be made,
remove the impossible one,
still show the working ones and maybe either just drop the failed comparison
or (better) add something as "n.d.", therefor maintaining the original structure?

This would be awesome!

PS: Just reading the proDA paper - then adding the issue here and noticing the identical name of the author!

const-ae commented 1 year ago

Hey, thank you for the kind words and the well written bug report. As you probably have already noticed, I am currently not on top of my Github issues and don't have the capacity to invest time to add features in ggsignif. You can of course write a PR to fix the issue and we will take a look and consider if we can merge it.

Best, Constantin

murpholinox commented 1 year ago

Same here! ...adding a comment to be notified ...

const-ae / ggsignif

geom_signif - all comparisons dissappear when one comparison has missings #126