corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

have stat layers preserve variables passed to aesthetics #132

Open Yingjie4Science opened 3 months ago

Yingjie4Science commented 3 months ago

Hi @corybrunson I have a similar but slightly different question: in your last example here, is it possible to only show the labels on the last axis, i.e., "ms460_NSA".

I have tried label = ifelse(survey == "ms460_NSA" & after_stat(n)>10, after_stat(n), NA)), but with an error "object 'survey' not found"

Originally posted by @Yingjie4Science in https://github.com/corybrunson/ggalluvial/issues/114#issuecomment-2190916709

corybrunson commented 3 months ago

Hi @Yingjie4Science, thanks for checking. I believe the reason this syntax doesn't work is that the use of after_stat() controls the entire expression passed to label, not just the part contained in after_stat(). The variable survey is not preserved by StatFlow, so it's not recognized. Instead, you can use the variable x, to which survey is passed, though since x is made numeric you'll need to know what number it corresponds to:

library(ggalluvial)
#> Loading required package: ggplot2
# rightward flow aesthetics for vaccine survey data, with cubic flows
data(vaccinations)
vaccinations$response <- factor(vaccinations$response,
                                rev(levels(vaccinations$response)))
# annotate fixed-width ribbons with counts
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           weight = freq, fill = response)) +
  geom_lode() + geom_flow(curve_type = "cubic") +
  geom_stratum(alpha = 0) +
  geom_text(
    stat = "flow",
    aes(
      label = ifelse(x == 3, after_stat(n), NA),
      hjust = (after_stat(flow) == "to")
    )
  )
#> Warning: Removed 44 rows containing missing values or values outside the scale range
#> (`geom_text()`).

Created on 2024-06-27 with reprex v2.1.0

Maybe it would be worthwhile to have the Stat*s preserve the variables passed to aesthetics. I'll leave this issue open as a reminder to try that.

Yingjie4Science commented 3 months ago

Thank you @corybrunson ! It's good to know that x is made numeric.

I have two follow-up questions related to the annotations.

  1. Is it possible to remove the text labels on the left side (see attached screenshot - labels in blue box) when x == 2?
  2. The current stat can add % as labels, but the % is calculated by lumping all strata. Is is possible to label the % by each stratum? (see an example in the screenshot, text in red)
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           weight = freq, fill = response)) +
  geom_lode() + geom_flow(curve_type = "cubic") +
  geom_stratum(alpha = 0) +
  geom_text(
    stat = "flow",
    aes(
      # label = ifelse(x == 2, after_stat(n), NA),
      label = ifelse(x == 2, scales::percent(after_stat(prop), accuracy = 0.1), NA),
      hjust = (after_stat(flow) == "to")
    )
  )

Weixin Screenshot_20240627233609

corybrunson commented 3 months ago

Hi @Yingjie4Science, i think (1) can be done by additionally conditioning the labels on after_stat(flow) == "to" (or against after_stat(flow) == "from"). Please report back on whether that works, or i can try it later.

I don't think (2) has a straightforward solution. It might also be something to implement as an additional computed variable, maybe stratum_count or just sum for the total of count within each stratum?

Yingjie4Science commented 3 months ago

Hi @corybrunson Thanks! The first solution works perfectly.

I am still struggling with (2) - are you suggesting we add an extra column to the dataframe? I am not sure how to call that data and use it in the label argument

corybrunson commented 2 weeks ago

@Yingjie4Science i apologize, i think i lost track of this exchange as other obligations piled up.

Regarding (2), i tried to write up my own understanding of computed variables here. Please let me know if the idea is clear. My proposal for (2) is then to add a new computed variable for within-stratum sums or proportions. This could be done quickly; i just need to think through the conventions (i.e. what to call these new columns) and consequences (i.e. make sure they don't introduce backward incompatibilities).