corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

double flow labels in ggalluvial #81

Closed Fatjetaa closed 3 years ago

Fatjetaa commented 3 years ago

Hello,

I am trying to put labels on the flows in my ggalluvial, but the flows are getting two different labels. It seems to memorize the label of the flow of the stratum before it.

The code I used is:

library(tidyverse)
library(ggplot2)
library(ggpubr)
library(ggalluvial)
library(dplyr)

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))

vaccinations <- vaccinations %>% 
  group_by(survey) %>% 
  mutate(pct = freq / sum(freq)) %>%
  select(-c(start_date,end_date))

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = pct,
           fill = response %in% c("Missing", "Never"),
           label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  scale_y_continuous(label = scales::percent_format()) +
  scale_fill_manual(values = c(`TRUE` = "cadetblue1", `FALSE` = "grey50")) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1))), stat = "stratum", size = 3) +
  geom_text(aes(label = paste0(scales::percent(..count.., accuracy = .1))), stat = "flow", size = 3, nudge_x = 0.25) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

to create this plot

Schermafbeelding 2021-03-31 113559

Thank you in advance.

corybrunson commented 3 years ago

Hi @Fatjetaa and thanks for raising the issue. The labeling of flows is problematic, and i'll try to briefly explain why. Usually, the text geom (geometric element) is paired with a stat (statistical transformation) that returns a data frame containing one row for each graphical object, and whose columns must include the aesthetics required by the text geom (x, y, and label). But the flow stat violates this principle: It returns a data frame containing two rows for each flow, one for the axis where it begins and another for the axis where it ends. The reason for this is that ggplot2 does not recognize enough aesthetics for a single row to be able to correctly position both the flow itself and a possible label.

You can see this in action in the vignette "The Order of the Rectangles", under "Positioning lodes within strata". In the data frame returned by the flow stat, the two rows corresponding to each flow are distinguished by the flow column, which takes the values "from" and "to", and they are linked by the poorly-named alluvium column, which takes integer values. The vignette shows how to use the hjust parameter to separate the "from" and "to" labels, in case you want to include both. If you want to include only the "from" labels, for example, then you can set label = ifelse(..flow.. == "to", "", paste0(scales::percent(..count.., accuracy = .1))) inside aes().

I hope that helps! Let me know if something doesn't work out.

Fatjetaa commented 3 years ago

Thank you very much, you are a hero!