Closed gorkang closed 4 years ago
@gorkang thank you for bringing this up! I knew that recent changes to computed variables would have several downstream effects, but i could not anticipate all of them. I've pared your example down in order to isolate the difference you're highlighting and to demonstrate how to recover the frequency labels you want (see below).
Here's the reason for the change in behavior: In previous versions, the statistical transformations (like stat_stratum()
) would aggregate integer and double columns differently than character and factor columns without printing messages or otherwise informing the user about this; it was an entirely silent presumption about what i thought users would want. So, if a column was integer or double (like your Freq
column), then it would be summed, whereas if it were character or factor then it would be kept (if its values were constant over an aggregated subset) or changed to NA
(otherwise).
I did this because i didn't yet understand computed variables and calculated aesthetics. See the ggplot2 documentation for how to use calculated aesthetics and my blog post for how packages make computed variables available to users. In v0.12.0, three aggregated computed variables are made available to the user, including count
, which is what you're after (and used in the code below, via the helper function after_stat()
). Other, non-numeric "computed" variables include stratum
and lode
; these may be used in the same way to label strata or other graphical elements. They should also be used via after_stat()
, but with stratum
specifically it usually still works without the helper function (as in your example).
I hope this helps! The documentation in v0.12.0 includes discussions of the computed variables with each statistical transformation, e.g. help(stat_stratum)
.
library(ggplot2)
library(readr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Data preparation -------------------------------------------------------------
df_JOINED = data.frame(
stringsAsFactors = FALSE,
brochure = c("standard", "standard","standard","standard",
"standard", "standard","standard","standard",
"pictorial", "pictorial","pictorial","pictorial",
"pictorial", "pictorial","pictorial","pictorial"),
had_enough = c("hadnt", "hadnt","had","had",
"had","hadnt","hadnt","had",
"had","had","had","had",
"had","had","had","had"),
said_enough = c("no", "yes","yes","yes",
"yes","yes","yes","yes",
"yes","yes","no","yes",
"yes","yes","yes","yes"))
df_alluvium_temp = df_JOINED %>%
group_by(brochure, had_enough, said_enough) %>%
summarise(Freq = n()) %>% ungroup() %>%
mutate(said_enough = forcats::fct_relevel(said_enough, c("yes", "no"))) %>%
mutate(had_enough = forcats::fct_relevel(had_enough, c("had", "hadnt")))
#> `summarise()` regrouping output by 'brochure', 'had_enough' (override with `.groups` argument)
print(df_alluvium_temp)
#> # A tibble: 5 x 4
#> brochure had_enough said_enough Freq
#> <chr> <fct> <fct> <int>
#> 1 pictorial had no 1
#> 2 pictorial had yes 7
#> 3 standard had yes 4
#> 4 standard hadnt no 1
#> 5 standard hadnt yes 3
# Plot 0.11.1 - all OK ---------------------------------------------------------
remotes::install_version("ggalluvial", version = "0.11.1",
repos = "http://cran.us.r-project.org")
#> Downloading package from url: http://cran.us.r-project.org/src/contrib/Archive/ggalluvial/ggalluvial_0.11.1.tar.gz
#> Adding 'ggalluvial_0.11.1.tgz' to the cache
library(ggalluvial)
ggplot(df_alluvium_temp,
aes(y = Freq, axis1 = had_enough, axis2 = said_enough, label = Freq)) +
geom_text(stat = "stratum")
# Plot 0.12 - some labels missing ----------------------------------------------
remotes::install_version("ggalluvial", version = "0.12.0",
repos = "http://cran.us.r-project.org")
#> Downloading package from url: http://cran.us.r-project.org/src/contrib/ggalluvial_0.12.0.tar.gz
#> Adding 'ggalluvial_0.12.0.tgz' to the cache
detach("package:ggalluvial", unload=TRUE)
library(ggalluvial)
ggplot(df_alluvium_temp,
aes(y = Freq, axis1 = had_enough, axis2 = said_enough, label = Freq)) +
geom_text(stat = "stratum")
#> Warning: Removed 3 rows containing missing values (geom_text).
ggplot(df_alluvium_temp,
aes(y = Freq, axis1 = had_enough, axis2 = said_enough)) +
geom_text(stat = "stratum", aes(label = after_stat(count)))
Created on 2020-07-28 by the reprex package (v0.3.0)
Thanks so much for the detailed explanation!
You're welcome—though i realized i'd messed up the part about non-numeric variables, so i'll rephrase that now.
In ggalluvial 0.11.1 all the Freq (count) labels appeared. When updating to 0.12, some of them are not appearing.
Please see a fully reproducible example below. Given the resolution of the images in the reprex there are overlaps in the labels, but things should look better with a higher resolution.
Created on 2020-07-28 by the reprex package (v0.3.0)