const-ae / ggsignif

Easily add significance brackets to your ggplots
https://const-ae.github.io/ggsignif/
GNU General Public License v3.0
593 stars 43 forks source link

Error in the value of the p-value #55

Closed paulinefx closed 5 years ago

paulinefx commented 5 years ago

Hello,

I have noticed when asking the for the t-test for the calculation of the p-value I wasn't getting the same value as the function t.test.

t.test(condition1, condition2) I get p-value = 0.00253 geom_signif(comparisons = list(c(condition1,condition2), test= "t.test", map_signif_level = FALSE) I get p-value = 0.53

I was wondering what were the parameters that you are using for the t.test

Thank you,

Pauline

const-ae commented 5 years ago

Hi Pauline, thank you for filling the issue. Could you please provide a reproducible example (preferably using reprex), so that I can investigate the problem? Best, Constantin

const-ae commented 5 years ago

I get identical p-values:

library(ggplot2)
library(ggsignif)
x <- rnorm(10)
y <- rnorm(10)

t.test(x, y)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  x and y
#> t = 1.1129, df = 17.016, p-value = 0.2812
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -0.4248211  1.3735109
#> sample estimates:
#>  mean of x  mean of y 
#>  0.2474435 -0.2269013

df <- data.frame(condition = c(rep("x", 10), rep("y", 10)),
                 value = c(x, y))

ggplot(df, aes(x=condition, y=value)) +
  geom_boxplot() +
  ggsignif::geom_signif(comparison = list(c("x", "y")),
                        test = "t.test")

Created on 2019-03-27 by the reprex package (v0.2.1)

paulinefx commented 5 years ago

Here is my example, where I do not get the same results image

paulinefx commented 5 years ago

sorry it didn't pass completely, let me retry

const-ae commented 5 years ago

Ah, sorry. I tried to add code highlighting to your comment, but now it has disappeared!

paulinefx commented 5 years ago

Sorry, I'm new using reprex, so it wasn't working properly

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.4.4
library("ggsignif")
#> Warning: package 'ggsignif' was built under R version 3.4.4

the.data <- data.frame("number.events" = c(10,25,24,27,22,2,4,34,10,11,18,6,25,9,8,31,10,44,15), "condition"=c(3,1,1,1,1,1,2,1,1,1,1,2,1,1,1,3,3,3,1))

t.test(the.data$condition==1,the.data$condition==2)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  the.data$condition == 1 and the.data$condition == 2
#> t = 4.4098, df = 31.187, p-value = 0.0001144
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  0.3112523 0.8466424
#> sample estimates:
#> mean of x mean of y 
#> 0.6842105 0.1052632
t.test(the.data$condition==1,the.data$condition==3)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  the.data$condition == 1 and the.data$condition == 3
#> t = 3.2504, df = 35.398, p-value = 0.00253
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  0.1779549 0.7694135
#> sample estimates:
#> mean of x mean of y 
#> 0.6842105 0.2105263
t.test(the.data$condition==2,the.data$condition==3)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  the.data$condition == 2 and the.data$condition == 3
#> t = -0.87519, df = 33.442, p-value = 0.3877
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -0.3498411  0.1393148
#> sample estimates:
#> mean of x mean of y 
#> 0.1052632 0.2105263

bplot <- ggplot(the.data, aes(x=factor(condition,levels =c("1", "2", "3")), y=number.events)) + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black"),
        axis.text=element_text(size=15),
        axis.title=element_text(size=17,face="bold")) +
  geom_boxplot(color = c("red","blue","darkgreen")) +
  labs(x="Conditions",y="Occurences") +
  geom_signif(comparisons = list(c("1", "2")),
              test = "t.test",
              map_signif_level=FALSE,
              y_position = 37)+
  geom_signif(comparisons = list(c("1", "3")), #ns
              test = "t.test", 
              map_signif_level=FALSE,
              y_position = 48)+
  geom_signif(comparisons = list(c("2", "3")), #ns
              test = "t.test",
              map_signif_level=FALSE,
              y_position = 46)

bplot

Created on 2019-03-27 by the reprex package (v0.2.1)

const-ae commented 5 years ago

Yes, thanks for reposting it.

I think I know what the problem is. In your t.test you write:

t.test(the.data$condition==1,
       the.data$condition==3)

and you get a very small p-value for this. But only because

print(the.data$condition==1)
#>  [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
#> [12] FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE

print(the.data$condition==3)
#>  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [12] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

which are converted to zeros and ones.

What you actually want to do is:

t.test(the.data[the.data$condition==1, "number.events"],
       the.data[the.data$condition==3, "number.events"])
#> 
#>  Welch Two Sample t-test
#> 
#> data:  the.data[the.data$condition == 1, "number.events"] and the.data[the.data$condition == 3, "number.events"]
#> t = -0.69142, df = 3.5927, p-value = 0.5314
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -31.50988  19.39450
#> sample estimates:
#> mean of x mean of y 
#>  17.69231  23.75000

which corresponds to the p-value my package outputs. Best, Constantin

paulinefx commented 5 years ago

Ah ! Yes of course ! Sorry for my mistake and thank you for the help

Best, Pauline

const-ae commented 5 years ago

No problem, you had me worried there for a second. But I am glad that we could easily resolve the issue. Best, Constantin