corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

Scale up the strata height when data contain NAs #31

Closed pisistrato closed 5 years ago

pisistrato commented 5 years ago

I guess this is linked to #30 , but what I am trying to do is to scale up the height of the strata that has NAs, so that all strata in the final plot will have the same dimensions (W and H). Lets take this simple data set "inspired" from the titanic data

aa <- data.table(Class = c(1, 1, 2, 3),
                 Sex = c("m", "f", "f", "f"),
                 Age = c(11, 11, 12, 12),
                 Survived = c("y", "y", "y", "n"),
                 Freq = 1)
#plot P1
ggplot(aa, aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() +
  geom_text(stat = "stratum", label.strata = TRUE) +
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.line.y = element_blank())

Easy peasy, OK. What if the data contain NA, like

bb <- data.table(Class = c(1, 1, 2, 3),
                 Sex = c("m", "f", "f", "f"),
                 Age = c(11, 11, 12, NA),
                 Survived = c("y", "y", "y", "n"),
                 Freq = 1)
#plot P2
ggplot(bb, aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() +
  geom_text(stat = "stratum", label.strata = TRUE) +
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.line.y = element_blank())

The resulting plot, contains a "blank" strata in Age, that I would like to remove. I can do that as follow

bb <- to_lodes_form(bb, key = "Demographic", axes = 1:3)
#plot P3
ggplot(na.omit(bb), aes(x = Demographic, stratum = stratum, alluvium = alluvium, y = Freq, label = stratum)) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() +
  geom_text(stat = "stratum") +
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.line.y = element_blank())

It is much better now, but how can I have the Age strata re-scaled up, so that it will be the same height as Class and Sex, preserving the ration between Age 11 and Age 12? Like in the plot P4 , which I badly edited manually with Inkscape? Plots here

corybrunson commented 5 years ago

@pisistrato i apologize for the delay! I saw this issue and meant to get to it sooner but lost track.

I think i understand what you're trying to do—each axis should have the same height and represent proportions rather than counts, correct? This isn't implemented in ggalluvial, but you can calculate a new column of group proportions and then pass that column to the y aesthetic. Here's an example using your data:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggalluvial)
#> Loading required package: ggplot2
# reproduce data
bb <- data.frame(Class = c(1, 1, 2, 3),
                 Sex = c("m", "f", "f", "f"),
                 Age = c(11, 11, 12, NA),
                 Survived = c("y", "y", "y", "n"),
                 Freq = 1)
bb <- to_lodes_form(bb, key = "Demographic", axes = 1:3)
# calculate within-group proportions
bb <- na.omit(bb)
bb <- group_by(bb, Demographic)
bb <- add_count(bb, Freq, name = "Total")
bb <- ungroup(bb)
bb <- transform(bb, Prop = Freq / Total)
# alluvial diagram of within-axis proportions
ggplot(bb, aes(x = Demographic, stratum = stratum, alluvium = alluvium, y = Prop, label = stratum)) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() +
  geom_text(stat = "stratum") +
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.line.y = element_blank())

Created on 2019-06-23 by the reprex package (v0.2.1)

It's been a while since i looked into this. Have you done something like this using, e.g., geom_bar()? If there's a way to have core ggplot2 layers perform this transformation internally, then i'll make an effort to mimic that functionality here. But, if it's not part of ggplot2, then i'll also leave it out of ggalluvial.