ggobi / ggally

R package that extends ggplot2
http://ggobi.github.io/ggally/
588 stars 119 forks source link

A additional ggpairs-like function using ggally_autopoint and ggally_count #345

Open schloerke opened 4 years ago

schloerke commented 4 years ago

via #317

I'd like to make a plot matrix that is "fully aligned". Where the diagonal aligns with the rows / columns.

I am not sold on

ggpairs(
  tips, 
  mapping = aes(color = sex),
  lower = list(continuous = "autopoint", discrete = "autopoint", combo = "autopoint"), 
  diag = list(discrete = "autopointDiag", continuous = "autopointDiag"), 
  upper = list(continuous = "cor", discrete = count") ## combo = ???
)

cc @larmarange

larmarange commented 4 years ago

If you decide to use ggally_cor() for upper, I would say that we should add a ggally_chisq_test() for discrete variables and a ggally_aov (one-way anova) for combo.

Note : t.test or Wilcoxon/Mann-Whithney can be used only to compare two mean or two median, while aov() allows to have a test working with a discrete variable with more than 2 factors

larmarange commented 4 years ago

An alternative could be to use for upper: density, count and boxplot or violin

larmarange commented 4 years ago

Just some additional thoughts: ggally_cor() is not presenting just a test but also a measure of correlation. Maybe it could be worth to think about similar correlation measurements for discrete and combo.

Worth of interest: https://medium.com/@outside2SDs/an-overview-of-correlation-measures-between-categorical-and-continuous-variables-4c7f85610365

larmarange commented 4 years ago

Some possible options.

For two discrete variables, display Cramer's V coefficient with bias correction that vary between 0 and 1. p-value will be determined using a chi-square test. Both works regardless of the number of categories in x and y. One possibility is to use rcompanion::cramerV() which is implementing biais correction and chisq.test for p-values.

Regarding one discrete and one continuous variable, aov() has assumptions about normality. A more generic approach and working regardless of the number of categories in the discrete variable would be to rely on Kruskall-Wallis test which is not parametric and implemented in base R. Epsilon-squared is a possible measurement of associations, ranging between 0 and 1. It could be computed with rcompanion::epislonSquared(). cf. https://rcompanion.org/handbook/F_08.html

Interesting reading as well: https://cran.r-project.org/web/packages/statsExpressions/vignettes/stats_details.html

larmarange commented 4 years ago

Cf. #286 as well

schloerke commented 4 years ago

I don't know if we have enough time to get this one right. Feels rushed to get in the next two days. Need time to play with it.

Let's sit on this one and release the new methods for the next release?

larmarange commented 4 years ago

No problem. Anyway, such new visualisation should be facilitated with the generic ggally_statistic()