corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

Some of the Freq labels not appearing in ggalluvial 0.12 #62

Closed gorkang closed 4 years ago

gorkang commented 4 years ago

In ggalluvial 0.11.1 all the Freq (count) labels appeared. When updating to 0.12, some of them are not appearing.

Please see a fully reproducible example below. Given the resolution of the images in the reprex there are overlaps in the labels, but things should look better with a higher resolution.


library(ggplot2)
library(readr)
library(dplyr)

# Data preparation --------------------------------------------------------

df_JOINED = data.frame(
  stringsAsFactors = FALSE,
  brochure = c("standard", "standard","standard","standard",
               "standard", "standard","standard","standard",
               "pictorial", "pictorial","pictorial","pictorial",
               "pictorial", "pictorial","pictorial","pictorial"),
  HAD_Enough_Information = c("HADNT_info", "HADNT_info","HAD_info","HAD_info",
                             "HAD_info","HADNT_info","HADNT_info","HAD_info",
                             "HAD_info","HAD_info","HAD_info","HAD_info",
                             "HAD_info","HAD_info","HAD_info","HAD_info"),
  SAID_Enough_Information = c("SAID_no", "SAID_yes","SAID_yes","SAID_yes",
                              "SAID_yes","SAID_yes","SAID_yes","SAID_yes",
                              "SAID_yes","SAID_yes","SAID_no","SAID_yes",
                              "SAID_yes","SAID_yes","SAID_yes","SAID_yes"))

df_alluvium_temp = df_JOINED %>% 
  group_by(brochure, HAD_Enough_Information, SAID_Enough_Information) %>% 
  summarise(Freq = n()) %>% ungroup() %>% 
  mutate(SAID_Enough_Information = forcats::fct_relevel(SAID_Enough_Information, c("SAID_yes", "SAID_no"))) %>%
  mutate(HAD_Enough_Information = forcats::fct_relevel(HAD_Enough_Information, c("HAD_info", "HADNT_info")))

# Plot 0.11.1 - all OK --------------------------------------------------------------------

remotes::install_version("ggalluvial", version = "0.11.1", repos = "http://cran.us.r-project.org")
library(ggalluvial)

ggplot(df_alluvium_temp, aes(y = Freq, axis1 = HAD_Enough_Information, axis2 = SAID_Enough_Information, label = Freq)) +
  geom_alluvium(aes(fill = paste0(HAD_Enough_Information, SAID_Enough_Information)), width = 1/12) +
  geom_stratum(width = 1/12, fill = "black", color = "grey", alpha = .8) +
  geom_label(stat = "stratum", nudge_y = -.5) +
  geom_label(stat = "stratum",
             label = c("HAD\nenough", "NO", "YES", "Did NOT \nhave enough", "HAD\nenough", "NO", "YES"),
             nudge_x = c(-.075, .075, .075, -.075, -.075, .075, .075,
                         nudge_y = .1)) +
  scale_x_discrete(limits = c("CONDITION\nHad enough information", "QUESTION\nDid you have enough info?"), expand = c(.05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("") +
  facet_grid(~ brochure) +
  theme_minimal() +
  theme(text = element_text(size = 16),
        legend.position = "none")
#> Warning in x + params$x: longer object length is not a multiple of shorter
#> object length


# Plot 0.12 - some labels missing --------------------------------------------------------------------

remotes::install_version("ggalluvial", version = "0.12", repos = "http://cran.us.r-project.org")
detach("package:ggalluvial", unload=TRUE)
library(ggalluvial)

ggplot(df_alluvium_temp, aes(y = Freq, axis1 = HAD_Enough_Information, axis2 = SAID_Enough_Information, label = Freq)) +
  geom_alluvium(aes(fill = paste0(HAD_Enough_Information, SAID_Enough_Information)), width = 1/12) +
  geom_stratum(width = 1/12, fill = "black", color = "grey", alpha = .8) +
  geom_label(stat = "stratum", nudge_y = -.5) +
  geom_label(stat = "stratum",
             label = c("HAD\nenough", "NO", "YES", "Did NOT \nhave enough", "HAD\nenough", "NO", "YES"),
             nudge_x = c(-.075, .075, .075, -.075, -.075, .075, .075,
                         nudge_y = .1)) +
  scale_x_discrete(limits = c("CONDITION\nHad enough information", "QUESTION\nDid you have enough info?"), expand = c(.05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("") +
  facet_grid(~ brochure) +
  theme_minimal() +
  theme(text = element_text(size = 16),
        legend.position = "none")
#> Warning in x + params$x: longer object length is not a multiple of shorter
#> object length
#> Warning: Removed 3 rows containing missing values (geom_label).

Created on 2020-07-28 by the reprex package (v0.3.0)

corybrunson commented 4 years ago

@gorkang thank you for bringing this up! I knew that recent changes to computed variables would have several downstream effects, but i could not anticipate all of them. I've pared your example down in order to isolate the difference you're highlighting and to demonstrate how to recover the frequency labels you want (see below).

Here's the reason for the change in behavior: In previous versions, the statistical transformations (like stat_stratum()) would aggregate integer and double columns differently than character and factor columns without printing messages or otherwise informing the user about this; it was an entirely silent presumption about what i thought users would want. So, if a column was integer or double (like your Freq column), then it would be summed, whereas if it were character or factor then it would be kept (if its values were constant over an aggregated subset) or changed to NA (otherwise).

I did this because i didn't yet understand computed variables and calculated aesthetics. See the ggplot2 documentation for how to use calculated aesthetics and my blog post for how packages make computed variables available to users. In v0.12.0, three aggregated computed variables are made available to the user, including count, which is what you're after (and used in the code below, via the helper function after_stat()). Other, non-numeric "computed" variables include stratum and lode; these may be used in the same way to label strata or other graphical elements. They should also be used via after_stat(), but with stratum specifically it usually still works without the helper function (as in your example).

I hope this helps! The documentation in v0.12.0 includes discussions of the computed variables with each statistical transformation, e.g. help(stat_stratum).

library(ggplot2)
library(readr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Data preparation -------------------------------------------------------------

df_JOINED = data.frame(
  stringsAsFactors = FALSE,
  brochure = c("standard", "standard","standard","standard",
               "standard", "standard","standard","standard",
               "pictorial", "pictorial","pictorial","pictorial",
               "pictorial", "pictorial","pictorial","pictorial"),
  had_enough = c("hadnt", "hadnt","had","had",
                 "had","hadnt","hadnt","had",
                 "had","had","had","had",
                 "had","had","had","had"),
  said_enough = c("no", "yes","yes","yes",
                  "yes","yes","yes","yes",
                  "yes","yes","no","yes",
                  "yes","yes","yes","yes"))

df_alluvium_temp = df_JOINED %>% 
  group_by(brochure, had_enough, said_enough) %>% 
  summarise(Freq = n()) %>% ungroup() %>% 
  mutate(said_enough = forcats::fct_relevel(said_enough, c("yes", "no"))) %>%
  mutate(had_enough = forcats::fct_relevel(had_enough, c("had", "hadnt")))
#> `summarise()` regrouping output by 'brochure', 'had_enough' (override with `.groups` argument)

print(df_alluvium_temp)
#> # A tibble: 5 x 4
#>   brochure  had_enough said_enough  Freq
#>   <chr>     <fct>      <fct>       <int>
#> 1 pictorial had        no              1
#> 2 pictorial had        yes             7
#> 3 standard  had        yes             4
#> 4 standard  hadnt      no              1
#> 5 standard  hadnt      yes             3

# Plot 0.11.1 - all OK ---------------------------------------------------------

remotes::install_version("ggalluvial", version = "0.11.1",
                         repos = "http://cran.us.r-project.org")
#> Downloading package from url: http://cran.us.r-project.org/src/contrib/Archive/ggalluvial/ggalluvial_0.11.1.tar.gz
#> Adding 'ggalluvial_0.11.1.tgz' to the cache
library(ggalluvial)

ggplot(df_alluvium_temp,
       aes(y = Freq, axis1 = had_enough, axis2 = said_enough, label = Freq)) +
  geom_text(stat = "stratum")


# Plot 0.12 - some labels missing ----------------------------------------------

remotes::install_version("ggalluvial", version = "0.12.0",
                         repos = "http://cran.us.r-project.org")
#> Downloading package from url: http://cran.us.r-project.org/src/contrib/ggalluvial_0.12.0.tar.gz
#> Adding 'ggalluvial_0.12.0.tgz' to the cache
detach("package:ggalluvial", unload=TRUE)
library(ggalluvial)

ggplot(df_alluvium_temp,
       aes(y = Freq, axis1 = had_enough, axis2 = said_enough, label = Freq)) +
  geom_text(stat = "stratum")
#> Warning: Removed 3 rows containing missing values (geom_text).


ggplot(df_alluvium_temp,
       aes(y = Freq, axis1 = had_enough, axis2 = said_enough)) +
  geom_text(stat = "stratum", aes(label = after_stat(count)))

Created on 2020-07-28 by the reprex package (v0.3.0)

gorkang commented 4 years ago

Thanks so much for the detailed explanation!

corybrunson commented 4 years ago

You're welcome—though i realized i'd messed up the part about non-numeric variables, so i'll rephrase that now.