corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

behaviors of aes.bind and of lode.ordering are inconsistent #52

Closed corybrunson closed 4 years ago

corybrunson commented 4 years ago

Issue #51 may have been resolved by using the aes.bind parameter. However, experiments with other solutions have exposed a shortcoming of the lode.ordering parameter. Currently, lode.ordering expects a vector of unique values, in fact a permutation of seq(nrow(data)), as might be output by order(). If a vector with repeated values is passed to lode.ordering, then the result may depend on the order of the rows of data. (The problem and solution both translate directly to passing a matrix to lode.ordering.)

An improvement would be for stat_alluvium() to treat such a vector in the same way that it currently treats the aesthetic variables when aes.bind = "alluvia"—that is, to sort the alluvia first by the vector passed to lode.ordering and then by the "deposits" that encode the orders of the strata at the other columns, in the sequence prescribed by the lode guidance function. In this revision, both lode.guidance and lode.ordering could be used together to determine the order of the lodes.

The plots below illustrate the present shortcoming. In each step that uses lode.ordering, the ordering vector is reversed for consistency with the rest of the plot. (Should this be done internally? It would be a breaking change.) An ordering vector that recovers the plot using aes.bind = "alluvia" is only obtained via a cumbersome dplyr composition. The revised behavior would obviate this step and instead recover the plot by simply using lode.ordering = data$V.

library(ggalluvial)
#> Loading required package: ggplot2
data <- as.data.frame(tibble::tibble(
  TP1 = "D1",
  TP2 = c("D1D1", "D1D1", "D1D2", "D1D2",
          "D1D1", "D1D2", "D1D2", "D1D2",
          "D1D1", "D1D1", "D1D2", "D1D2",
          "D1D1", "D1D2", "D1D2", "D1D1",
          "D1D2", "D1D1", "D1D2", "D1D1"),
  TP3 = c("D1D1D1", "D1D1D2", "D1D1D1", "D1D2D1",
          "D1D2D1", "D1D1D1", "D1D1D2", "D1D2D2",
          "D1D1D1", "D1D2D2", "D1D2D2", "D1D2D1",
          "D1D1D1", "D1D2D2", "D1D2D1", "D1D1D2",
          "D1D2D2", "D1D2D1", "D1D1D2", "D1D1D2"),
  V = c("IGHV1", "IGHV2", "IGHV1", "IGHV31",
        "IGHV2", "IGHV3", "IGHV3", "IGHV4",
        "IGHV1", "IGHV4", "IGHV5", "IGHV4",
        "IGHV2", "IGHV4", "IGHV5", "IGHV3",
        "IGHV48", "IGHV1", "IGHV4", "IGHV3"),
  Freq = c(10, 15, 31, 22, 2, 1, 111, 45, 67, 89,
           23, 48, 90, 12, 46, 78, 90, 100, 0, 20)
))

# default alluvium settings
ggplot(data = data,
       aes(axis1 = TP1, axis2 = TP2, axis3 = TP3,
           y = Freq)) +
  scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) +
  xlab("Time point") +
  geom_alluvium(aes(fill = V)) +
  geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE)


# bind by aesthetics
ggplot(data = data,
       aes(axis1 = TP1, axis2 = TP2, axis3 = TP3,
           y = Freq)) +
  scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) +
  xlab("Time point") +
  geom_alluvium(aes(fill = V), aes.bind = "alluvia") +
  geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE)


# order by reversed aesthetic variable
ggplot(data = data,
       aes(axis1 = TP1, axis2 = TP2, axis3 = TP3,
           y = Freq)) +
  scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) +
  xlab("Time point") +
  geom_alluvium(aes(fill = V), lode.ordering = -xtfrm(data$V)) +
  geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE)


# order by order on reversed aesthetic variable
my_order <- order(order(-xtfrm(data$V)))
ggplot(data = data,
       aes(axis1 = TP1, axis2 = TP2, axis3 = TP3,
           y = Freq)) +
  scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) +
  xlab("Time point") +
  geom_alluvium(aes(fill = V), lode.ordering = my_order) +
  geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE)


# order by order on reversed aesthetic, then axis, variables
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
my_order <- data %>%
  select(V, TP1, TP2, TP3) %>%
  mutate_all(~ -xtfrm(.)) %>%
  do.call(what = order) %>%
  order()
ggplot(data = data,
       aes(axis1 = TP1, axis2 = TP2, axis3 = TP3,
           y = Freq)) +
  scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) +
  xlab("Time point") +
  geom_alluvium(aes(fill = V), lode.ordering = my_order) +
  geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE)

Created on 2020-04-02 by the reprex package (v0.3.0)