Open schloerke opened 4 years ago
If you decide to use ggally_cor()
for upper, I would say that we should add a ggally_chisq_test()
for discrete variables and a ggally_aov
(one-way anova) for combo.
Note : t.test or Wilcoxon/Mann-Whithney can be used only to compare two mean or two median, while aov()
allows to have a test working with a discrete variable with more than 2 factors
An alternative could be to use for upper: density, count and boxplot or violin
Just some additional thoughts: ggally_cor()
is not presenting just a test but also a measure of correlation. Maybe it could be worth to think about similar correlation measurements for discrete and combo.
Worth of interest: https://medium.com/@outside2SDs/an-overview-of-correlation-measures-between-categorical-and-continuous-variables-4c7f85610365
Some possible options.
For two discrete variables, display Cramer's V coefficient with bias correction that vary between 0 and 1. p-value will be determined using a chi-square test. Both works regardless of the number of categories in x and y. One possibility is to use rcompanion::cramerV()
which is implementing biais correction and chisq.test
for p-values.
Regarding one discrete and one continuous variable, aov()
has assumptions about normality. A more generic approach and working regardless of the number of categories in the discrete variable would be to rely on Kruskall-Wallis test which is not parametric and implemented in base R. Epsilon-squared is a possible measurement of associations, ranging between 0 and 1. It could be computed with rcompanion::epislonSquared()
. cf. https://rcompanion.org/handbook/F_08.html
Interesting reading as well: https://cran.r-project.org/web/packages/statsExpressions/vignettes/stats_details.html
Cf. #286 as well
I don't know if we have enough time to get this one right. Feels rushed to get in the next two days. Need time to play with it.
Let's sit on this one and release the new methods for the next release?
No problem. Anyway, such new visualisation should be facilitated with the generic ggally_statistic()
via #317
I'd like to make a plot matrix that is "fully aligned". Where the diagonal aligns with the rows / columns.
I am not sold on
cc @larmarange