corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
499 stars 34 forks source link

scales::percent(after_stat(prop)) return values inconsistently #117

Closed AdrianS85 closed 1 year ago

AdrianS85 commented 1 year ago

Dear all,

I am trying to prepare a plot where percentage values of stratal categories with low-frequency are placed outside bars. It seems to me that scales::percent(after_stat(prop)) may not work consistently. Here is the issue:

Lets take these 2 files: https://github.com/AdrianS85/varia/blob/master/test_aluv_bug1.xlsx, https://github.com/AdrianS85/varia/blob/master/test_aluv_bug2.xlsx and load first one using test <- readxl::read_excel("test_aluv_bug1.xlsx", col_types = "text")

Now I want to move values lower than 5% to the side. Here is my code:

ggplot(test, aes(x = time_point,stratum = question,alluvium = sample_id,fill = question,label = question)) +
  scale_x_discrete(expand = c(.1, .1), limits = unique(test[[ "time_point" ]])) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(
    mapping = aes(label = ifelse(scales::percent(after_stat(prop)) > 5, scales::percent(after_stat(prop), accuracy = .1), NA)),
    stat = "stratum")+
  ggrepel::geom_text_repel(
    mapping = aes(label = ifelse(scales::percent(after_stat(prop)) <= 5, scales::percent(after_stat(prop), accuracy = .1), NA)),
    stat = "stratum", direction = "y", nudge_x = -.20, nudge_y = -6)

Nice. Exactly what I wanted. BUT... lets try this on test <- readxl::read_excel("test_aluv_bug2.xlsx", col_types = "text") ... what? so percentages are not really percentages? Ok, then lets change scales::percent(after_stat(prop)) value "5", which I use in ifelse(), to "0.5". Yes, its now good for the file 2.... but now file 1 is not working properly?

Help?

Best Adrian

corybrunson commented 1 year ago

Hi @AdrianS85, thanks for checking. I am able to run your code using both data sets, and i think i understand your question. The solution is to compare numerics rather than strings in the mapping to label.

To illustrate the problem, try the following:

scales::percent(.5) > c(5, 50, 500)
scales::percent(.5) <= c(5, 50, 500)

The reason for the strange results is that the right-hand sides are being converted to characters before being compared to the left-hand sides, and therefore the comparisons are between characters rather than doubles. (My go-to source for this is Hadley Wickham's book.)

In the code below, i run the comparisons before the formatting step, and the results line up with my expectations (treating after_stat(prop) as a fraction). Please let me know if this works on your end—or if you have any follow-up questions.

library(ggalluvial)
#> Loading required package: ggplot2

test1 <-
  readxl::read_excel("~/Downloads/test_aluv_bug1.xlsx", col_types = "text")
test2 <-
  readxl::read_excel("~/Downloads/test_aluv_bug2.xlsx", col_types = "text")

# 5% threshold
ggplot(test1,
       aes(x = time_point,
           stratum = question,
           alluvium = sample_id,
           fill = question,
           label = question)) +
  scale_x_discrete(expand = c(.1, .1),
                   limits = unique(test1[[ "time_point" ]])) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(
    aes(label = scales::percent(ifelse(after_stat(prop) > .05,
                                       after_stat(prop),
                                       NA_real_), accuracy = .1)),
    stat = "stratum")+
  ggrepel::geom_text_repel(
    aes(label = scales::percent(ifelse(after_stat(prop) <= .05,
                                       after_stat(prop),
                                       NA_real_), accuracy = .1)),
    stat = "stratum", direction = "y", nudge_x = -.20, nudge_y = -6)
#> Warning: Removed 4 rows containing missing values (`geom_text()`).
#> Warning: Removed 2 rows containing missing values (`geom_text_repel()`).


# 50% threshold
ggplot(test2,
       aes(x = time_point,
           stratum = question,
           alluvium = sample_id,
           fill = question,
           label = question)) +
  scale_x_discrete(expand = c(.1, .1),
                   limits = unique(test2[[ "time_point" ]])) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(
    aes(label = scales::percent(ifelse(after_stat(prop) > .5,
                                       after_stat(prop),
                                       NA_real_), accuracy = .1)),
    stat = "stratum")+
  ggrepel::geom_text_repel(
    aes(label = scales::percent(ifelse(after_stat(prop) <= .5,
                                       after_stat(prop),
                                       NA_real_), accuracy = .1)),
    stat = "stratum", direction = "y", nudge_x = -.20, nudge_y = -6)
#> Warning: Removed 2 rows containing missing values (`geom_text()`).
#> Removed 2 rows containing missing values (`geom_text_repel()`).

Created on 2023-08-21 with reprex v2.0.2

AdrianS85 commented 1 year ago

Dear @corybrunson,

This works! Thank You so much for taking time to provide explanation, You are amazing : ]

corybrunson commented 1 year ago

You're welcome! Glad it's what you needed.