corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

Not all flows are plotted with more than 9 axes? #9

Closed dimitriderooij closed 6 years ago

dimitriderooij commented 6 years ago

Hi, I like what you did with ggalluvial. However I'm struggling to create a plot with many axes. I'm starting to think it is a bug with ggalluvial, but maybe I'm just doing something wrong.

I use a dataframe with one row per alluvium where each alluvium can have one of three states and can change state each quarter.

When I try to plot all these quarters in one plot (e.g. using 20 axes) a few flows are not plotted. When I split the plot into three plots each containing a maximum of 9 axes all flows are plotted.

I'm using ggplot 2.2.1. Any help would be appreciated.

rplot

Code to reproduce:

df <- structure(list(Q201501 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", 
      "B", "C"), class = "factor"), Q201502 = structure(c(1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 3L
      ), .Label = c("A", "B", "C"), class = "factor"), Q201503 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 
      3L), .Label = c("A", "B", "C"), class = "factor"), Q201504 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 
      3L), .Label = c("A", "B", "C"), class = "factor"), Q201601 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 
      3L), .Label = c("A", "B", "C"), class = "factor"), Q201602 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
      2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 3L, 1L, 2L, 2L, 1L, 3L, 
      3L), .Label = c("A", "B", "C"), class = "factor"), Q201603 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 
      2L, 1L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 3L, 
      3L), .Label = c("A", "B", "C"), class = "factor"), Q201604 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 2L, 
      2L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 
      3L), .Label = c("A", "B", "C"), class = "factor"), Q201701 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 2L, 
      2L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 
      3L), .Label = c("A", "B", "C"), class = "factor"), Q201702 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 1L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 3L, 3L, 1L, 1L, 2L, 
      2L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201703 = structure(c(2L, 
      2L, 3L, 3L, 3L, 3L, 1L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 2L, 2L, 1L, 
      1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201704 = structure(c(2L, 
      2L, 1L, 3L, 3L, 3L, 1L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 1L, 
      1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201801 = structure(c(2L, 
      2L, 1L, 3L, 3L, 3L, 1L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 1L, 
      1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201802 = structure(c(2L, 
      2L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201803 = structure(c(1L, 
      2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201804 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201901 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201902 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201903 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q201904 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), Q202001 = structure(c(1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
      1L), .Label = c("A", "B", "C"), class = "factor"), aantal = c(1L, 
      1L, 2L, 2L, 3L, 7L, 1L, 2L, 1L, 7L, 13L, 2L, 1L, 1L, 1L, 1L, 
      12L, 12L, 8L, 24L, 1L, 3L, 1L, 20L, 2L, 1L, 1L, 9L, 6L, 10L, 
      1L, 2L, 1L, 3L, 1L, 3L, 10L, 5L, 1L, 1L, 3L, 2L, 1L, 1L, 14L, 
      19L, 22L, 46L, 4L, 3L, 1L, 3L, 1L, 1L, 10L, 16L, 1L, 1L, 1L, 
      1L, 14L, 1L, 2L, 1L, 1L, 1L)), row.names = c(NA, -66L), class = c("tbl_df", 
      "tbl", "data.frame"), .Names = c("Q201501", "Q201502", "Q201503", 
      "Q201504", "Q201601", "Q201602", "Q201603", "Q201604", "Q201701", 
      "Q201702", "Q201703", "Q201704", "Q201801", "Q201802", "Q201803", 
      "Q201804", "Q201901", "Q201902", "Q201903", "Q201904", "Q202001", 
      "aantal"))

# With more than 10 axes, fails to plot all flows, flows between some strata are not plotted (1-2, 2-3, 9-10, 19-20)
ggplot(df, aes(weight = aantal,
               axis01 = Q201501, axis02 = Q201502, axis03 = Q201503, axis04 = Q201504,
               axis05 = Q201601, axis06 = Q201602, axis07 = Q201603, axis08 = Q201604,
               axis09 = Q201701, axis10 = Q201702, axis11 = Q201703, axis12 = Q201704,
               axis13 = Q201801, axis14 = Q201802, axis15 = Q201803, axis16 = Q201804,
               axis17 = Q201901, axis18 = Q201902, axis19 = Q201903, axis20 = Q201904)) +
  geom_alluvium(aes(fill = Q202001), width = 1/12) +
  geom_stratum(width = 1/12, fill = "grey")

# With less than 10 axis, all flows are plotted
ggplot(df, aes(weight = aantal,
               axis01 = Q201501, axis02 = Q201502, axis03 = Q201503, axis04 = Q201504,
               axis05 = Q201601, axis06 = Q201602, axis07 = Q201603, axis08 = Q201604,
               axis09 = Q201701)) +
  geom_alluvium(aes(fill = Q202001), width = 1/12) +
  geom_stratum(width = 1/12, fill = "grey")

ggplot(df, aes(weight = aantal,
                          axis01 = Q201701, axis02 = Q201702, axis03 = Q201703, axis04 = Q201704,
                          axis05 = Q201801, axis06 = Q201802, axis07 = Q201803, axis08 = Q201804,
                          axis09 = Q201901)) +
  geom_alluvium(aes(fill = Q202001), width = 1/12) +
  geom_stratum(width = 1/12, fill = "grey")

ggplot(df, aes(weight = aantal,
               axis01 = Q201901, axis02 = Q201902, axis03 = Q201903, axis04 = Q201904)) +
  geom_alluvium(aes(fill = Q202001), width = 1/12) +
  geom_stratum(width = 1/12, fill = "grey")
corybrunson commented 6 years ago

Thanks! This was my own carelessness, using

as.numeric(as.factor(x))

to convert a possibly character or factor x to pure numeric. I'm not sure what the best way to do this is, but i've substituted

cumsum(!duplicated(x))

in a recent commit. It solves the problem for me, using your example. Please let me know if it doesn't work for you! (I'll revisit this when i have more time and make a more careful fix; leaving this issue open for that reason.)

While tweaking your examples, i also discovered a bug that i'd introduced in a previous fix (causes an error when no aesthetics are declared); that's also fixed. I'm glad to have been prompted to examine it soon after the error. : )

dimitriderooij commented 6 years ago

Your commit works for me, thanks for the quick response! I'll leave it open as you requested, let me know if I need to close it.