IndrajeetPatil / ggstatsplot

Enhancing {ggplot2} plots with statistical analysis 📊📣
https://indrajeetpatil.github.io/ggstatsplot/
GNU General Public License v3.0
1.98k stars 184 forks source link

pairwise_p results differ based on dropped levels #180

Closed IndrajeetPatil closed 5 years ago

IndrajeetPatil commented 5 years ago

Why are the confidence intervals and p-values different here when everything else (t, df, etc.) remains the same?!

set.seed(123)
library(tidyverse, warn.conflicts = FALSE)
options(tibble.width = Inf)

# drop levels
msleep2 <- dplyr::filter(
  .data = ggplot2::msleep,
  vore %in% c("carni", "omni")
)

# checking if the two dataframes are equivalent

# 26-by-11 tibble
msleep %>%
  dplyr::filter(!is.na(vore), !is.na(brainwt)) %>%
  dplyr::filter(.data = ., vore %in% c("omni", "carni"))

#> # A tibble: 26 x 11
#>    name                       genus        vore  order          
#>    <chr>                      <chr>        <chr> <chr>          
#>  1 Owl monkey                 Aotus        omni  Primates       
#>  2 Greater short-tailed shrew Blarina      omni  Soricomorpha   
#>  3 Dog                        Canis        carni Carnivora      
#>  4 Star-nosed mole            Condylura    omni  Soricomorpha   
#>  5 African giant pouched rat  Cricetomys   omni  Rodentia       
#>  6 Lesser short-tailed shrew  Cryptotis    omni  Soricomorpha   
#>  7 Long-nosed armadillo       Dasypus      carni Cingulata      
#>  8 North American Opossum     Didelphis    omni  Didelphimorphia
#>  9 European hedgehog          Erinaceus    omni  Erinaceomorpha 
#> 10 Patas monkey               Erythrocebus omni  Primates       
#>    conservation sleep_total sleep_rem sleep_cycle awake  brainwt bodywt
#>    <chr>              <dbl>     <dbl>       <dbl> <dbl>    <dbl>  <dbl>
#>  1 <NA>                17         1.8      NA       7   0.0155    0.48 
#>  2 lc                  14.9       2.3       0.133   9.1 0.00029   0.019
#>  3 domesticated        10.1       2.9       0.333  13.9 0.07     14    
#>  4 lc                  10.3       2.2      NA      13.7 0.001     0.06 
#>  5 <NA>                 8.3       2        NA      15.7 0.0066    1    
#>  6 lc                   9.1       1.4       0.15   14.9 0.000140  0.005
#>  7 lc                  17.4       3.1       0.383   6.6 0.0108    3.5  
#>  8 lc                  18         4.9       0.333   6   0.0063    1.7  
#>  9 lc                  10.1       3.5       0.283  13.9 0.0035    0.77 
#> 10 lc                  10.9       1.1      NA      13.1 0.115    10    
#> # ... with 16 more rows

# 26-by-11 tibble
msleep2 %>%
  dplyr::filter(!is.na(vore), !is.na(brainwt)) 

#> # A tibble: 26 x 11
#>    name                       genus        vore  order          
#>    <chr>                      <chr>        <chr> <chr>          
#>  1 Owl monkey                 Aotus        omni  Primates       
#>  2 Greater short-tailed shrew Blarina      omni  Soricomorpha   
#>  3 Dog                        Canis        carni Carnivora      
#>  4 Star-nosed mole            Condylura    omni  Soricomorpha   
#>  5 African giant pouched rat  Cricetomys   omni  Rodentia       
#>  6 Lesser short-tailed shrew  Cryptotis    omni  Soricomorpha   
#>  7 Long-nosed armadillo       Dasypus      carni Cingulata      
#>  8 North American Opossum     Didelphis    omni  Didelphimorphia
#>  9 European hedgehog          Erinaceus    omni  Erinaceomorpha 
#> 10 Patas monkey               Erythrocebus omni  Primates       
#>    conservation sleep_total sleep_rem sleep_cycle awake  brainwt bodywt
#>    <chr>              <dbl>     <dbl>       <dbl> <dbl>    <dbl>  <dbl>
#>  1 <NA>                17         1.8      NA       7   0.0155    0.48 
#>  2 lc                  14.9       2.3       0.133   9.1 0.00029   0.019
#>  3 domesticated        10.1       2.9       0.333  13.9 0.07     14    
#>  4 lc                  10.3       2.2      NA      13.7 0.001     0.06 
#>  5 <NA>                 8.3       2        NA      15.7 0.0066    1    
#>  6 lc                   9.1       1.4       0.15   14.9 0.000140  0.005
#>  7 lc                  17.4       3.1       0.383   6.6 0.0108    3.5  
#>  8 lc                  18         4.9       0.333   6   0.0063    1.7  
#>  9 lc                  10.1       3.5       0.283  13.9 0.0035    0.77 
#> 10 lc                  10.9       1.1      NA      13.1 0.115    10    
#> # ... with 16 more rows

# check those levels are not included
ggstatsplot::pairwise_p(
  data = msleep2,
  x = vore,
  y = brainwt,
  messages = FALSE,
  p.adjust.method = "none"
)

#> # A tibble: 1 x 11
#>   group1 group2 mean.difference conf.low conf.high    se t.value    df
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl> <dbl>   <dbl> <dbl>
#> 1 omni   carni           -0.066   -0.245     0.112 0.061   0.774  21.1
#>   p.value significance p.value.label
#>     <dbl> <chr>        <chr>        
#> 1   0.447 ns           p = 0.447

# with the original data
ggstatsplot::pairwise_p(
  data = ggplot2::msleep,
  x = vore,
  y = brainwt,
  messages = FALSE,
  p.adjust.method = "none"
) %>%
  dplyr::filter(.data = ., group1 == "omni", group2 == "carni")

#> # A tibble: 1 x 11
#>   group1 group2 mean.difference conf.low conf.high    se t.value    df
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl> <dbl>   <dbl> <dbl>
#> 1 omni   carni           -0.066   -0.306     0.173 0.061   0.774  21.1
#>   p.value significance p.value.label
#>     <dbl> <chr>        <chr>        
#> 1   0.865 ns           p = 0.865

Created on 2019-02-22 by the reprex package (v0.2.1.9000)

IndrajeetPatil commented 5 years ago

@ibecav Can you please have a look at this issue and see if you can figure out what's going on?

ibecav commented 5 years ago

Yes. The good news is your code is correct and this is actually the "right" answer. Or perhaps more clearly this is the expected result. Remember these are family wide comparisons and therefore you shouldn't expect the same answer when you change the "size" of the vore "family" between 2 factor levels and 4 factor levels.

Tukey HSD tests differences between each mean as compared to the critical value that is set for the test of the means that are furthest apart which changes a lot when you add in the factors for herbi and insecti

There's something more troubling here though that I'll comment on in a minute but this issue is a non issue.

ibecav commented 5 years ago

Okay, I finally sorted my way through what was troubling me. First I hadn't realized you were using Games-Howell exclusively when var.equal = FALSE that's fine but probably should get mentioned in the doco.

As the reprex shows below you're currently adjusting p values but not confidence intervals when var.equal = TRUE don't know whether it's worth it to add another package to adjust them...

# reprex
set.seed(123)
library(tidyverse, warn.conflicts = FALSE)
library(ggstatsplot)

# If the user selects var.equal false they always get Games Howell
# for var.equal = TRUE
# Currently the p values are adjusted but the CI's aren't they're all default Tukey
# HSD 

# default holm
pairwise_p(data = msleep,
           x = vore,
           y = brainwt, 
           var.equal = TRUE,
           messages = FALSE) %>%
  filter(group1 == "omni", group2 == "carni")
#>   group1 group2 mean.difference conf.low conf.high p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>   <dbl> <chr>       
#> 1 omni   carni           0.0665    -1.05      1.18       1 ns          

# switch to Bonferroni
pairwise_p(data = msleep,
            x = vore,
            y = brainwt, 
            p.adjust.method = "bonferroni", 
            var.equal = TRUE,
            messages = FALSE)  %>%
  filter(group1 == "omni", group2 == "carni")
#>   group1 group2 mean.difference conf.low conf.high p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>   <dbl> <chr>       
#> 1 omni   carni           0.0665    -1.05      1.18       1 ns          

# Try none
pairwise_p(data = msleep,
           x = vore,
           y = brainwt, 
           p.adjust.method = "none", 
           var.equal = TRUE,
           messages = FALSE)  %>%
  filter(group1 == "omni", group2 == "carni")
#>   group1 group2 mean.difference conf.low conf.high p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>   <dbl> <chr>       
#> 1 omni   carni           0.0665    -1.05      1.18   0.875 ns          

# as opposed to adjusting both
library(DescTools)
# build the aov model
a2 <- aov(brainwt ~ vore, msleep)
# feed it to DescTools::PostHocTest
# DescTools::PostHocTest(a2, method = "hsd")
# Tukey
as_tibble(DescTools::PostHocTest(a2, method = "hsd")$vore, rownames = "group1-group2") %>%
  filter(`group1-group2` == "omni-carni")
#> # A tibble: 1 x 5
#>   `group1-group2`   diff lwr.ci upr.ci  pval
#>   <chr>            <dbl>  <dbl>  <dbl> <dbl>
#> 1 omni-carni      0.0665  -1.05   1.18 0.999
# Bonferroni
as_tibble(DescTools::PostHocTest(a2, method = "bonferroni")$vore, rownames = "group1-group2") %>%
  filter(`group1-group2` == "omni-carni")
#> # A tibble: 1 x 5
#>   `group1-group2`   diff lwr.ci upr.ci  pval
#>   <chr>            <dbl>  <dbl>  <dbl> <dbl>
#> 1 omni-carni      0.0665  -1.09   1.22     1
# Scheffe
as_tibble(DescTools::PostHocTest(a2, method = "scheffe")$vore, rownames = "group1-group2")  %>%
  filter(`group1-group2` == "omni-carni")
#> # A tibble: 1 x 5
#>   `group1-group2`   diff lwr.ci upr.ci  pval
#>   <chr>            <dbl>  <dbl>  <dbl> <dbl>
#> 1 omni-carni      0.0665  -1.15   1.28 0.999

Created on 2019-03-11 by the reprex package (v0.2.1)

IndrajeetPatil commented 5 years ago

The exact test used for pairwise multiple comparisons is displayed in the caption itself and the caption adjusts itself based on the var.equal and p.adjust.method arguments. But you think we should also include these details in the roxygen documentation?

image

For the issue of adjusting CIs for multiple comparisons in output from pairwise_p, I'd much prefer to just drop the columns containing confidence intervals. The sole function of pairwise_p() is to generate p-values that can be used to create appropriate comparison geoms in ggbetweenstats and ggwithinstats. All the additional details it provides are incidental (and not consistent across tyoes of tests). I don't think it's worth it to gain an additional dependency just to provide these details. If adjusted CIs for post-hoc comparisons is what people want, there are multiple other R packages that the users can rely on.

ibecav commented 5 years ago

I agree on not needing to add more I'd just mention in one sentence that if var.equal = FALSE you're always going to use G-H