corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

explicit & consistent aesthetic aggregation #44

Closed corybrunson closed 4 years ago

corybrunson commented 5 years ago

Description of the issue

Currently, aesthetics are aggregated in stat_stratum() by the unexported helper function auto_aggregate() and in the other stat_*()s in other ways. The functions take numeric versus character labels into account differently:

Reproducible example (preferably using reprex::reprex())

Here are some examples:

library(ggalluvial)
#> Loading required package: ggplot2
# vaccinations data
data(vaccinations)
# aggregates label when passed same variable as `y`
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq, fill = response)) +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = freq), stat = "flow") +
  geom_flow()

# aggregates numeric labels other than `y`
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq, fill = response)) +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = a), stat = "flow", size = 2) +
  geom_flow()

# no aggregation
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq)) +
  geom_flow(aes(fill = response), stat = "alluvium", aggregate.y = TRUE) +
  geom_stratum() +
  geom_text(aes(label = a), stat = "alluvium", aggregate.y = TRUE, size = 2)

# majors data
data(majors)
# no aggregation
ggplot(majors,
       aes(x = semester, stratum = curriculum, alluvium = student,
           fill = curriculum)) +
  geom_stratum() +
  geom_flow(stat = "alluvium", color = "black") +
  geom_text(aes(label = student), stat = "alluvium")

Created on 2019-10-02 by the reprex package (v0.3.0)

And here are some examples that appear to result in infinite loops:

# INFINITE LOOP
ggplot(majors,
       aes(x = semester, stratum = curriculum, alluvium = student,
           fill = curriculum)) +
  geom_stratum() +
  geom_flow(stat = "alluvium", color = "black", aggregate.y = TRUE) +
  geom_text(aes(label = as.character(student)), stat = "alluvium", aggregate.y = TRUE)
# INFINITE LOOP
ggplot(majors,
       aes(x = semester, stratum = curriculum, alluvium = student,
           fill = curriculum)) +
  geom_stratum() +
  geom_flow(stat = "alluvium", color = "black", aggregate.y = TRUE) +
  geom_text(aes(label = as.integer(student)), stat = "alluvium", aggregate.y = TRUE)
corybrunson commented 4 years ago

A crucial decision is what options should be available to the user for how the ordering of lodes should respect aesthetic variables. The current plan is to allow three options:

  1. no influence of aesthetics on order at all (default); once ordered by deposit (both of the index axis and of other axes, if appropriate), alluvia or flows should be ordered by the variable passed to alluvium, i.e. the case ID (which will be taken to be the minimum case ID if cases are aggregated)
  2. order by aesthetics after other axis deposits but before case IDs (current behavior under aes.bind = FALSE)
  3. order by aesthetics after index axis deposit but before other axis deposits (current behavior under aes.bind = TRUE)

The primary reason for ignoring aesthetics by default is that multiple layers will need to be passed the same aesthetics in order to produce the same orderings. By default, then, multiple layers using the same stat will plot lodes in the same order.

corybrunson commented 4 years ago

Resolved by f3c6a5c46ce371de923b1c5126e43177811fa982 through 27cca5cba3aa8f7b6e24844178e41494be35266a.