corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

compute stats for ggplot2 after-stats functionality #50

Closed corybrunson closed 4 years ago

corybrunson commented 4 years ago

Description of the issue

The ggplot2 stat layers stat_count() and stat_sum() add "count", "n", and/or "prop" columns to the data returned by their compute_*() functions, which enable the after_stat() functionality described here. While the ggalluvial layers, in particular *_stratum(), mimic much of the behavior of geom_bar(), the user is not able to control aesthetic evaluation in the same way.

Reproducible example (preferably using reprex::reprex())

This question on StackOverflow illustrates this limitation. The failed solution attempt below should work once the functionality is implemented:

library(ggplot2)
library(ggalluvial)
# toy data set
df <- data.frame('id' = rep(1:50,2),
                 'stage' = c(rep(1,50), rep(2,50)),
                 'group' = sample(c('A','B','C'), 100, replace = TRUE))
# without labeling
ggplot(df,
       aes(x = stage, stratum = group, alluvium = id, fill = group)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5)

# attempted labeling using `after_stat()`
ggplot(df,
       aes(x = stage, stratum = group, alluvium = id, fill = group)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = scales::percent(after_stat(prop))), stat = "stratum")
#> Error in after_stat(prop): object 'prop' not found

Created on 2020-03-19 by the reprex package (v0.3.0)

corybrunson commented 4 years ago

This is underway on the new compute-stats branch.

corybrunson commented 4 years ago

Once implemented, this might obviate the need for the ad hoc class-dependent aggregation steps currently performed by the stat_*() layers.

corybrunson commented 4 years ago

The example below relies on the ad hoc aggregation of numeric variables. It cannot be reproduced using after_stat() because the variable a is mapped to the aesthetic label, which is not involved in the stat_*() calculations.

library(ggalluvial)
#> Loading required package: ggplot2
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq, fill = response, label = round(a, 3))) +
  geom_lode() + geom_flow() +
  geom_stratum(alpha = 0) +
  geom_text(stat = "stratum")

Created on 2020-03-19 by the reprex package (v0.3.0)

A solution would be to add the optional aesthetic weight to the stat layers, with default weight = 1L, which would weight n as in stat_sum() as well as count and prop as in stat_count().

This would also resolve the theoretically serious problem that the values of a are rounded before being summed.

corybrunson commented 4 years ago

As of 18f4091b0e577398ca51424c0b47fef91898d527, the above problems seem to be solved.

Expressing computed variables in percentages

library(ggalluvial)
#> Loading required package: ggplot2
set.seed(1)
df <- data.frame('id' = rep(1:50,2),
                 'stage' = c(rep(1,50), rep(2,50)),
                 'group' = sample(c('A','B','C'), 100, replace = TRUE))
ggplot(df,
       aes(x = stage, stratum = group, alluvium = id, fill = group)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = scales::percent(after_stat(prop))), stat = "stratum")

Created on 2020-03-19 by the reprex package (v0.3.0)

Weighting computed variables

library(ggalluvial)
#> Loading required package: ggplot2
# rightward flow aesthetics for vaccine survey data
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
# annotate with proportional counts
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq, fill = response)) +
  geom_lode() + geom_flow() +
  geom_stratum(alpha = 0) +
  geom_text(stat = "stratum", aes(label = round(after_stat(prop), 3)))

# annotate with survey-weighted proportional counts
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq, fill = response, weight = a)) +
  geom_lode() + geom_flow() +
  geom_stratum(alpha = 0) +
  geom_text(stat = "stratum", aes(label = round(after_stat(prop), 3)))

Created on 2020-03-19 by the reprex package (v0.3.0)

corybrunson commented 4 years ago

These were included in v0.12.0, which was released from 5ea1f2df20711416621521da4d5f815a998e717d.