corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

geom_alluvial not changing width of flows for me #18

Closed jaspercooper closed 6 years ago

jaspercooper commented 6 years ago

Hi, really cool package, I'm excited to get using it in my working paper.

I've been hacking away at this all morning but can't seem to fix it: for some reason the flow width is constant in the following, any idea what I'm doing wrong? Apologies if this is obvious.

Data looks like this:

axis1 axis2 Freq
A2 A1 55
A2 B1 0
A2 C1 0
A2 D1 0
B2 A1 15
B2 B1 0
B2 C1 0
B2 D1 0
C2 A1 7
C2 B1 9
C2 C1 7
C2 D1 0
D2 A1 0
D2 B1 0
D2 C1 3
D2 D1 5

Code :

ggplot(plot_data,
       aes(weight = Freq,
           axis1 = axis1, 
           axis2 = axis2)) +
  geom_alluvium(aes(fill = axis1))

Gives:

screen shot 2018-05-03 at 11 52 46 am

corybrunson commented 6 years ago

Hi, and thanks! I hope you can make use of it.

The plot you've shared is accurate based on the data. The flows don't change widths because the data is in wide format, i.e. each alluvium has a fixed weight (and most have weight zero and are therefore not plotted). As for why they also don't cross each other, it's because, perhaps by accident, they perfectly respect the order of the groupings at the two axes. This became clear to me after plotting the strata along with the alluvia:

ggplot(plot_data,
   aes(weight = Freq,
       axis1 = axis1, 
       axis2 = axis2)) +
geom_alluvium(aes(fill = axis1)) +
geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE)

If this isn't what you expected, then perhaps the data needs to be transformed?

jaspercooper commented 6 years ago

Thanks so much for the quick reply, Cory, and sorry for taking up your time with this. I think ultimately I may not be able to plot the data as I wanted using ggalluvial, unfortunately. The application I had in mind is to map a change in proportions occurring between the left and right axes, with flows going only from A1 to A2, but in this case getting smaller to represent the shrinking of that category. I think this might violate some of the core principles of the package. Thanks anyway!

adomingues commented 6 years ago

Hi @jaspercooper,

May I ask how you end up doing your plot? I m in a similar position, trying to:

map a change in proportions occurring between the left and right axes, with flows going only from A1 to A2, but in this case getting smaller to represent the shrinking of that category

Any pointers would be appreciated.

Cheers.

jaspercooper commented 6 years ago

hi @adomingues,

Sorry to say I gave up in the end and wrote it from scratch using ggplot! I think it was not so much the fault of geom_alluvial as me not understanding that it wasn't an appropriate application of the function.

Sorry to not be very helpful!

adomingues commented 6 years ago

Cheers @jaspercooper! That is what I feared. I am also now looking at how to do it with "vanilla" ggplot2.

@corybrunson Despite not useful for me now, ggalluvial it is still a very nice package.

corybrunson commented 6 years ago

@jaspercooper @adomingues thanks for your feedback and input, even if it's not a perfect match with the package! For reference, this SO question comes the closest to what i thought you might be trying to do. If this example isn't what you're after, then very likely ggalluvial is not the right package. If it is, but you're not sure still how to achieve it, let me know and i'll be glad to help!

adomingues commented 6 years ago

You are right @corybrunson: ggalluvial is not the right package for my goal. An alluvial plot is intended to show how individual values change between categories, whereas I was trying to show changes in proportion of categories between categories. Well, anyway, I had the basic of what an alluvial is all wrong.

Anyway, just in case someone is interested in a similar visualization, I "hand-crafted" the plot that I had in mind using a combination of geom_bar and geom_ribbon, and some extra dusting, to achieve the visual flow I had in mind: germline-expression-adult-genes_comparison_wt-alluvial-1

And also: germline-expression-adult-genes_comparison_wt-alluvial-2

Cheers.

jaspercooper commented 6 years ago

Wow, nice work @adomingues. A much nicer version of the same sort of graph I ended up making (below). I don't know about you but I found it to be a real pain getting the labels in the right places.

@corybrunson maybe it's worth thinking about something like this even if it's not completely in keeping with alluvial principles. Happy to share code if it's helpful.

screen shot 2018-08-26 at 1 48 41 pm

corybrunson commented 6 years ago

@adomingues and @jaspercooper it would be straightforward to generate plots like both of these from properly formatted data using ggalluvial! If you share an example data set then i'll be glad to produce one for illustration. I used the "vaccinations" data set to produce a similar example in the technical intro (the vignette titled "ggalluvial"). The one thing i'm not sure of is how one would label the flows between the axes; i didn't design the package for that purpose, but it might be easy to do nonetheless.

adomingues commented 6 years ago

Thank you @jaspercooper. I took some inspiration from your plot (your latest publication iirc). It took me more hours and googling that I like to think about. I am happy with sharing the code as well (thinking about a blogpost soon).

And yes, getting the labels right (geom_text btw) was a pain but getting the ribbons going to and from the right places also involved quite a bit of playing with the factor levels and other assorted nuisances. Here is a snippet of the code for the labels:

new_legend <- adult_perc %>%
   filter(Strain == "Adult expression") %>%
   mutate(
      y = ymin + (ymax - ymin) / 2,
      x = 1.5
      ) %>%
   select(Germline_expression, x, y)
new_legend

p + geom_text(data = new_legend, aes(x = x, y = y, label = Germline_expression, color = Germline_expression), show.legend = FALSE, fontface = 2, position=position_jitter(w=0.1)) +

Maybe my R skills aren't up to the task, but I have the impression that making this for more than two bars will be too much of hassle.

jaspercooper commented 6 years ago

huh, OK! Personally was not able to figure it out but I suspect that was just me being dense. Perhaps for future wanderers of the internet something like this would serve as a simple MWE:

N <- 100
group_1 <- sample(LETTERS[1:4],100,TRUE,1:4/sum(1:4))
group_2 <- sample(LETTERS[1:4],1000,TRUE,4:1/sum(1:4))

(ninja edit: was responding to @corybrunson -- great that this helped you out @adomingues and sorry for the hours googling, in hindsight I see I could have just sent you the code!) (edit 2: made group sizes different to better represent nature of the issue)

corybrunson commented 6 years ago

Here's what i think you're after, using base R to create a data frame that ggalluvial knows how to read. I'm having trouble with reprex(); if you have trouble generating the figure, let me know and i'll try again.

group_1 <- sample(LETTERS[1:4],100,TRUE,1:4/sum(1:4))
group_2 <- sample(LETTERS[1:4],1000,TRUE,4:1/sum(1:4))

d1 <- as.data.frame(table(group_1))
d1 <- setNames(d1, c("group", "count"))
d1$proportion = d1$count / length(group_1)
d1$time <- 1L
d2 <- as.data.frame(table(group_2))
d2 <- setNames(d2, c("group", "count"))
d2$proportion <- d2$count / length(group_2)
d2$time <- 2L
d <- rbind(d1, d2)

library(ggplot2)
library(ggalluvial)
gg <- ggplot(
  d,
  aes(x = time, y = proportion, stratum = group, alluvium = group)
) +
  geom_stratum(aes(fill = group)) +
  geom_alluvium(aes(fill = group), knot.pos = 0) +
  geom_text(aes(label = round(proportion, digits = 2)), stat = "stratum")
plot(gg)

Straightforward, but not obvious! I didn't want to allow the user to input data in as wide a variety of forms as alluvial allows, since part of the spirit of ggplot2 is to standardize the input data.

Labeling the flows would be difficult even in theory, since, in general, if there are 4 groups at both axes then there will be 16 flows. In practice, it's currently impossible using ggalluvial, since stat_alluvium() and stat_flow() return the endpoints of the flows (which are located at the strata, within the axes) rather than the centers of the flows (which are between the axes). But i'll think about how this might be done in a future release. Thanks for the suggestion!

jaspercooper commented 6 years ago

This looks great, and super straightforward compared to all the nonsense I had to cook up! Will definitely be using it in future. Thanks @corybrunson

adomingues commented 6 years ago

Wow, this is incredibly simple @corybrunson. I second the sentiments of @jaspercooper - so many ugly hacks when the solution was there all along. Cheers!

singcell commented 4 years ago

@adomingues Your plot above is nice and exactly what I am looking for but have not idea how to make it. Could you please share the code and input data plot. in the mean time did you figure out easier way to generate it. Thank you

adomingues commented 4 years ago

@singcell I can try to dig up the code, but I can't promise you anything until tomorrow. si you try the solution that @corybrunson posted? https://github.com/corybrunson/ggalluvial/issues/18#issuecomment-416064909

singcell commented 4 years ago

@singcell I can try to dig up the code, but I can't promise you anything until tomorrow. si you try the solution that @corybrunson posted? #18 (comment)

Thank you!

adomingues commented 4 years ago

@singcell here it is:

p_line2 <- ggplot(all_perc, aes(x = Strain_lab, y = Perc, fill = Germline_expression)) + 
    geom_bar(stat = "identity", width = 0.1) +
    geom_ribbon(aes(x = Strain_pos, ymin = ymin, ymax = ymax, fill = Germline_expression), alpha=.2) +
    geom_text(data = new_legend, aes(x = x, y = y, label = Germline_expression, color = Germline_expression), show.legend = FALSE, fontface = 2, position=position_jitter(w=0.1)) +
    scale_y_continuous(labels = scales::percent) +
    scale_fill_prettier() +
    scale_color_prettier() +
    guides(fill=guide_legend(title="Germline expression")) +
    labs(x = "",
      y = "% of genes") +
    theme_classic(base_size = 14) 
p_line2

Sadly I can give you the data for a full reprex because I am not authorized to make the data public yet.

singcell commented 4 years ago

Thank you @adomingues. I appreciate your help.