Multiple comparisions and contrasts

aphalo / ggpmisc

R package ggpmisc is an extension to ggplot2 and the Grammar of Graphics

https://docs.r4photobiology.info/ggpmisc

100 stars 6 forks source link

Multiple comparisions and contrasts #41

Closed aphalo closed 1 year ago

aphalo commented 1 year ago

'ggsignif' exports geom_signif() and stat_signif(), however pairwise comparisons are done individually, and p-values returned uncorrected. In addittion geom_signif() and stat_signif() are too much interdependent, making it impossible to use stat_signif() with other geometries. No labels to be parsed into expressions are returned and small p-values are not shown as < 0.001 or similar as they are in 'ggpmisc'.

After looking at 'ggsignif' code with the intention of submitting a pull request, it seems more reasonable to implement from scratch a statistic and a geometry with similar functionality in 'ggpmisc' than updating 'ggsignif'.

Getting this done soon and included in the 2nd edition of Learn R: As a Language would be ideal.

The geometry can be implemented first, and tested and used on its own, and later one or more statistics could be written making use of it: stat_fit_contrasts() and stat_multcomp().

aphalo commented 1 year ago

Geometries geom_text_pairwise() and geom_label_pairwise() are now implemented in package 'ggpp'. Statistics need to be implemented as part of 'ggpmisc'. One could based one of them on gmodels::fit.contrast() and another on pairwise.t.test() and TukeyHSD(), or even better using package 'multcomp'.

aphalo commented 1 year ago

Statistic stat_multcomp() is now working. Unit tests still to be done. This stat can generate two types of labels: labelled bars as in 'stat_signif()` or letters that avoid clutter when the factor has many levels. The bars can be labelled with P-values and/or the fitted difference, or star-encoded P-values. Numeric values are also returned. The main effect of the factor is tested first, and if not significant, multiple comparisons skipped.

aphalo commented 1 year ago

Things to consider: switch default from "bars" to "letters" if levels in the factor are more than five. Smaller default size for text. Test other model fitted functions: only lm() tested at the moment. Only Tukey contrasts implemented, can we easily get Dunnet contrasts implemented in the same function? Fix the "funny" ordering of letters.

aphalo commented 1 year ago

Automatic switch between "bars" and "letters" can only be implemented during redering, which seems too late to be of practical use.

aphalo commented 1 year ago

Now both Tukey and Dunnet contrasts are implemented and working. This should cater for most use cases. I keep the issue open, as it would be possible to support arbitrary pairwise contrasts, rather easily. Or at least a staircase of pairwise contrasts.

More useful would be to write a new stat computing sumultaneous CIs using glht(), and fairly easy to implement. Or, possibly add a third label.type to those implemented in stat_multcomp() for CI's. These CIs are computed jointly and from the fitted model, so they are not equivalent to those created with stat_summary().

aphalo commented 1 year ago

Unit tests now implemented and 'ggpmisc' 0.5.4 ready for CRAN.