IndrajeetPatil / ggstatsplot

Enhancing {ggplot2} plots with statistical analysis 📊📣
https://indrajeetpatil.github.io/ggstatsplot/
GNU General Public License v3.0
1.99k stars 184 forks source link

Feature request: options to turn off stats #218

Closed yanxianl closed 5 years ago

yanxianl commented 5 years ago

Hi, thanks for developing the ggstatsplot that produces information-rich, publication-ready plots!

I was applying grouped_ggscatterstats() to a list of data frames, some of which contain only a few data points in the subgroups that are not suitable for running stats. This resulted in an error and I had to combine ggplot2 and ggscatterstats to make the final plots. It'd be great to disable stats function when there're not enough data points to run stats. One will not have to leave the ggstatsplot and make plots using another package in such situations if there're options to turn off stats.


Reprex example:

library(ggstatsplot)
#> Warning: package 'ggstatsplot' was built under R version 3.5.3

# Make a dataframe with enough data points
set.seed(1910)
df1 <- cbind.data.frame(group = rep(c("A", "B"), times = c(30, 10)),
                       x = rnorm(40, mean = 3, sd = 1),
                       y = rnorm(40, mean = 3, sd = 1))
# Make scatter plots
grouped_ggscatterstats(data = df1,
                       x = x, 
                       y = y,
                       type = "spearman", 
                       conf.level = 0.95, 
                       xlab = "x", 
                       ylab = "y", 
                       k = 3, 
                       marginal.type = "density", 
                       xfill = "#0072B2", 
                       yfill = "#009E73", 
                       xalpha = 0.6, 
                       yalpha = 0.6, 
                       point.width.jitter = 0.2, 
                       point.height.jitter = 0.4, 
                       grouping.var = group, 
                       title.prefix = "Group",
                       messages = F,
                       ncol = 1,
                       title.text = "ggstatsplot produces information rich plots")


# Make a dataframe containing insuffient data points for correlation analysis
set.seed(1910)
df2 <- cbind.data.frame(group = rep(c("A", "B"), times = c(30, 2)),
                        x = rnorm(32, mean = 3, sd = 1),
                        y = rnorm(32, mean = 3, sd = 1))
# Make scatter plots
grouped_ggscatterstats(data = df2,
                       x = x, 
                       y = y,
                       type = "spearman", 
                       conf.level = 0.95, 
                       xlab = "x", 
                       ylab = "y", 
                       k = 3, 
                       marginal.type = "density", 
                       xfill = "#0072B2", 
                       yfill = "#009E73", 
                       xalpha = 0.6, 
                       yalpha = 0.6, 
                       point.width.jitter = 0.2, 
                       point.height.jitter = 0.4, 
                       grouping.var = group, 
                       title.prefix = "Group",
                       messages = F,
                       ncol = 1,
                       title.text = "ggstatsplot produces information rich plots")

#> Error in if (const(t, min(1e-08, mean(t, na.rm = TRUE)/1e+06))) {: missing value where TRUE/FALSE needed

Created on 2019-05-24 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> - Session info ---------------------------------------------------------- #> setting value #> version R version 3.5.2 (2018-12-20) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz Europe/Berlin #> date 2019-05-24 #> #> - Packages -------------------------------------------------------------- #> ! package * version date lib #> abind 1.4-5 2016-07-21 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.4 2019-04-10 [1] #> BayesFactor 0.9.12-4.2 2018-05-19 [1] #> boot 1.3-20 2017-08-06 [3] #> broom 0.5.2 2019-04-07 [1] #> broom.mixed 0.2.4 2019-02-21 [1] #> broomExtra 0.0.3 2019-05-20 [1] #> callr 3.2.0 2019-03-15 [1] #> car 3.0-2 2018-08-23 [1] #> carData 3.0-2 2018-09-30 [1] #> cellranger 1.1.0 2016-07-27 [1] #> cli 1.1.0 2019-03-19 [1] #> cluster 2.0.9 2019-05-01 [1] #> coda 0.19-2 2018-10-08 [1] #> codetools 0.2-15 2016-10-05 [3] #> coin 1.3-0 2019-03-08 [1] #> colorspace 1.4-1 2019-03-18 [1] #> cowplot 0.9.99 2019-04-08 [1] #> crayon 1.3.4 2017-09-16 [1] #> curl 3.3 2019-01-10 [1] #> data.table 1.12.2 2019-04-07 [1] #> DEoptimR 1.0-8 2016-11-19 [1] #> desc 1.2.0 2018-05-01 [1] #> DescTools 0.99.28 2019-03-17 [1] #> devtools 2.0.2 2019-04-08 [1] #> digest 0.6.19 2019-05-20 [1] #> dplyr 0.8.1 2019-05-14 [1] #> ellipsis 0.1.0 2019-02-19 [1] #> emmeans 1.3.4 2019-04-21 [1] #> EMT 1.1 2013-01-29 [1] #> estimability 1.3 2018-02-11 [1] #> evaluate 0.13 2019-02-12 [1] #> expm 0.999-4 2019-03-21 [1] #> ez 4.4-0 2016-11-02 [1] #> fit.models 0.5-14 2017-04-06 [1] #> forcats 0.4.0 2019-02-17 [1] #> foreign 0.8-71 2018-07-20 [3] #> fs 1.3.1 2019-05-06 [1] #> generics 0.0.2 2018-11-29 [1] #> ggcorrplot 0.1.3 2019-05-19 [1] #> ggExtra 0.8 2018-04-04 [1] #> ggplot2 3.1.1 2019-04-07 [1] #> ggrepel 0.8.1 2019-05-07 [1] #> ggsignif 0.5.0 2019-02-20 [1] #> ggstatsplot * 0.0.10 2019-03-17 [1] #> glue 1.3.1 2019-03-12 [1] #> groupedstats 0.0.6 2019-03-20 [1] #> gtable 0.3.0 2019-03-25 [1] #> gtools 3.8.1 2018-06-26 [1] #> haven 2.1.0 2019-02-19 [1] #> highr 0.8 2019-03-20 [1] #> hms 0.4.2 2018-03-10 [1] #> htmltools 0.3.6 2017-04-28 [1] #> httpuv 1.5.1 2019-04-05 [1] #> httr 1.4.0 2018-12-11 [1] #> insight 0.3.0 2019-05-11 [1] #> jmv 0.9.6.1 2019-04-22 [1] #> jmvcore 0.9.6.4 2019-03-28 [1] #> knitr 1.23 2019-05-18 [1] #> labeling 0.3 2014-08-23 [1] #> later 0.8.0 2019-02-11 [1] #> lattice 0.20-38 2018-11-04 [3] #> lazyeval 0.2.2 2019-03-15 [1] #> libcoin 1.0-4 2019-02-28 [1] #> lme4 1.1-21 2019-03-05 [1] #> lmtest 0.9-37 2019-04-30 [1] #> magrittr 1.5 2014-11-22 [1] #> manipulate 1.0.1 2014-12-24 [1] #> MASS 7.3-51.4 2019-04-26 [1] #> Matrix 1.2-15 2018-11-01 [3] #> MatrixModels 0.4-1 2015-08-22 [1] #> matrixStats 0.54.0 2018-07-23 [1] #> mc2d 0.1-18 2017-03-06 [1] #> memoise 1.1.0 2017-04-21 [1] #> metafor 2.1-0 2019-05-14 [1] #> mgcv 1.8-28 2019-03-21 [1] #> mime 0.6 2018-10-05 [1] #> miniUI 0.1.1.1 2018-05-18 [1] #> minqa 1.2.4 2014-10-09 [1] #> mnormt 1.5-5 2016-10-15 [1] #> modelr 0.1.4 2019-02-18 [1] #> modeltools 0.2-22 2018-07-16 [1] #> multcomp 1.4-10 2019-03-05 [1] #> multcompView 0.1-7 2015-07-31 [1] #> munsell 0.5.0 2018-06-12 [1] #> mvtnorm 1.0-10 2019-03-05 [1] #> nlme 3.1-137 2018-04-07 [3] #> nloptr 1.2.1 2018-10-03 [1] #> nortest 1.0-4 2015-07-30 [1] #> openxlsx 4.1.0 2018-05-26 [1] #> paletteer 0.2.1 2019-02-13 [1] #> pbapply 1.4-0 2019-02-05 [1] #> pcaPP 1.9-73 2018-01-14 [1] #> pillar 1.4.0 2019-05-11 [1] #> pkgbuild 1.0.3 2019-03-20 [1] #> pkgconfig 2.0.2 2018-08-16 [1] #> pkgload 1.0.2 2018-10-29 [1] #> plyr 1.8.4 2016-06-08 [1] #> prettyunits 1.0.2 2015-07-13 [1] #> processx 3.3.1 2019-05-08 [1] #> promises 1.0.1 2018-04-13 [1] #> ps 1.3.0 2018-12-21 [1] #> psych 1.8.12 2019-01-12 [1] #> purrr 0.3.2 2019-03-15 [1] #> purrrlyr 0.0.5 2019-03-15 [1] #> R6 2.4.0 2019-02-14 [1] #> rcompanion 2.1.7 2019-04-09 [1] #> Rcpp 1.0.1 2019-03-17 [1] #> readxl 1.3.1 2019-03-13 [1] #> remotes 2.0.4 2019-04-10 [1] #> reshape 0.8.8 2018-10-23 [1] #> reshape2 1.4.3 2017-12-11 [1] #> rio 0.5.16 2018-11-26 [1] #> rjson 0.2.20 2018-06-08 [1] #> rlang 0.3.4 2019-04-07 [1] #> rmarkdown 1.13 2019-05-22 [1] #> robust 0.4-18 2017-04-27 [1] #> robustbase 0.93-5 2019-05-12 [1] #> rprojroot 1.3-2 2018-01-03 [1] #> rrcov 1.4-7 2018-11-15 [1] #> rstudioapi 0.10 2019-03-19 [1] #> sandwich 2.5-1 2019-04-06 [1] #> scales 1.0.0 2018-08-09 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> shiny 1.3.2 2019-04-22 [1] #> sjlabelled 1.0.17 2019-03-10 [1] #> sjmisc 2.7.9 2019-03-16 [1] #> sjstats 0.17.4 2019-03-15 [1] #> skimr 1.0.5 2019-02-25 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0 2019-02-10 [1] #> survival 2.44-1.1 2019-04-01 [1] #> testthat 2.1.1 2019-04-23 [1] #> TH.data 1.0-10 2019-01-21 [1] #> tibble 2.1.1 2019-03-16 [1] #> tidyr 0.8.3 2019-03-01 [1] #> tidyselect 0.2.5 2018-10-11 [1] #> D TMB 1.7.15 2018-11-09 [1] #> usethis 1.5.0 2019-04-07 [1] #> withr 2.1.2 2018-03-15 [1] #> WRS2 0.10-0 2018-06-15 [1] #> xfun 0.7 2019-05-14 [1] #> xml2 1.2.0 2018-01-24 [1] #> xtable 1.8-4 2019-04-21 [1] #> yaml 2.2.0 2018-07-25 [1] #> zip 2.0.2 2019-05-13 [1] #> zoo 1.8-5 2019-03-21 [1] #> source #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> Github (wilkelab/cowplot@6282f49) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.0) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.0) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.0) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.1) #> CRAN (R 3.5.0) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.0) #> CRAN (R 3.5.3) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.2) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> #> [1] C:/R/library #> [2] \\nmbu.no/my/Home/Documents/R/win-library/3.5 #> [3] C:/Program Files/R/R-3.5.2/library #> #> D -- DLL MD5 mismatch, broken installation. ```
ibecav commented 5 years ago

It is a very nice package so the thanks go to Indrajeet. As to your current dilemma the whole point of the packages is to mix the stats and the plots. Speaking just for me I don't have the time to chase this one down and even if I did a plot with just 4 points (the minimum needed in most cases feels a bit much...

But a practical solution on your side is quite manageable. Create a temporary data frame that has only the groups that will plot and feed that to ggscatterstats instead something like this should be easily extendable to your real data.

library(dplyr)

df2 <- cbind.data.frame(group = rep(c("A", "B"), times = c(30, 2)),
  x = rnorm(32, mean = 3, sd = 1),
  y = rnorm(32, mean = 3, sd = 1))

tempdf <- df2 %>% group_by(group) %>%
  summarise(n_rec = n()) %>%
  arrange(desc(n_rec)) %>% 
  filter(n_rec >= 4)

tempdf
#> # A tibble: 1 x 2
#>   group n_rec
#>   <fct> <int>
#> 1 A        30

Created on 2019-05-24 by the reprex package (v0.3.0)

IndrajeetPatil commented 5 years ago

@yanxianl All functions included in this package have an argument called results.subtitle. If you set it to FALSE, it will skip any statistical analysis.

For example, for ggscatterstats- image