Closed jaspercooper closed 6 years ago
Hi, and thanks! I hope you can make use of it.
The plot you've shared is accurate based on the data. The flows don't change widths because the data is in wide format, i.e. each alluvium has a fixed weight (and most have weight zero and are therefore not plotted). As for why they also don't cross each other, it's because, perhaps by accident, they perfectly respect the order of the groupings at the two axes. This became clear to me after plotting the strata along with the alluvia:
ggplot(plot_data,
aes(weight = Freq,
axis1 = axis1,
axis2 = axis2)) +
geom_alluvium(aes(fill = axis1)) +
geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE)
If this isn't what you expected, then perhaps the data needs to be transformed?
Thanks so much for the quick reply, Cory, and sorry for taking up your time with this. I think ultimately I may not be able to plot the data as I wanted using ggalluvial, unfortunately. The application I had in mind is to map a change in proportions occurring between the left and right axes, with flows going only from A1 to A2, but in this case getting smaller to represent the shrinking of that category. I think this might violate some of the core principles of the package. Thanks anyway!
Hi @jaspercooper,
May I ask how you end up doing your plot? I m in a similar position, trying to:
map a change in proportions occurring between the left and right axes, with flows going only from A1 to A2, but in this case getting smaller to represent the shrinking of that category
Any pointers would be appreciated.
Cheers.
hi @adomingues,
Sorry to say I gave up in the end and wrote it from scratch using ggplot! I think it was not so much the fault of geom_alluvial
as me not understanding that it wasn't an appropriate application of the function.
Sorry to not be very helpful!
Cheers @jaspercooper! That is what I feared. I am also now looking at how to do it with "vanilla" ggplot2.
@corybrunson Despite not useful for me now, ggalluvial
it is still a very nice package.
@jaspercooper @adomingues thanks for your feedback and input, even if it's not a perfect match with the package! For reference, this SO question comes the closest to what i thought you might be trying to do. If this example isn't what you're after, then very likely ggalluvial is not the right package. If it is, but you're not sure still how to achieve it, let me know and i'll be glad to help!
You are right @corybrunson: ggalluvial is not the right package for my goal. An alluvial plot is intended to show how individual values change between categories, whereas I was trying to show changes in proportion of categories between categories. Well, anyway, I had the basic of what an alluvial is all wrong.
Anyway, just in case someone is interested in a similar visualization, I "hand-crafted" the plot that I had in mind using a combination of geom_bar
and geom_ribbon
, and some extra dusting, to achieve the visual flow I had in mind:
And also:
Cheers.
Wow, nice work @adomingues. A much nicer version of the same sort of graph I ended up making (below). I don't know about you but I found it to be a real pain getting the labels in the right places.
@corybrunson maybe it's worth thinking about something like this even if it's not completely in keeping with alluvial principles. Happy to share code if it's helpful.
@adomingues and @jaspercooper it would be straightforward to generate plots like both of these from properly formatted data using ggalluvial! If you share an example data set then i'll be glad to produce one for illustration. I used the "vaccinations"
data set to produce a similar example in the technical intro (the vignette titled "ggalluvial"). The one thing i'm not sure of is how one would label the flows between the axes; i didn't design the package for that purpose, but it might be easy to do nonetheless.
Thank you @jaspercooper. I took some inspiration from your plot (your latest publication iirc). It took me more hours and googling that I like to think about. I am happy with sharing the code as well (thinking about a blogpost soon).
And yes, getting the labels right (geom_text
btw) was a pain but getting the ribbons going to and from the right places also involved quite a bit of playing with the factor levels and other assorted nuisances. Here is a snippet of the code for the labels:
new_legend <- adult_perc %>%
filter(Strain == "Adult expression") %>%
mutate(
y = ymin + (ymax - ymin) / 2,
x = 1.5
) %>%
select(Germline_expression, x, y)
new_legend
p + geom_text(data = new_legend, aes(x = x, y = y, label = Germline_expression, color = Germline_expression), show.legend = FALSE, fontface = 2, position=position_jitter(w=0.1)) +
Maybe my R
skills aren't up to the task, but I have the impression that making this for more than two bars will be too much of hassle.
huh, OK! Personally was not able to figure it out but I suspect that was just me being dense. Perhaps for future wanderers of the internet something like this would serve as a simple MWE:
N <- 100
group_1 <- sample(LETTERS[1:4],100,TRUE,1:4/sum(1:4))
group_2 <- sample(LETTERS[1:4],1000,TRUE,4:1/sum(1:4))
(ninja edit: was responding to @corybrunson -- great that this helped you out @adomingues and sorry for the hours googling, in hindsight I see I could have just sent you the code!) (edit 2: made group sizes different to better represent nature of the issue)
Here's what i think you're after, using base R to create a data frame that ggalluvial knows how to read. I'm having trouble with reprex()
; if you have trouble generating the figure, let me know and i'll try again.
group_1 <- sample(LETTERS[1:4],100,TRUE,1:4/sum(1:4))
group_2 <- sample(LETTERS[1:4],1000,TRUE,4:1/sum(1:4))
d1 <- as.data.frame(table(group_1))
d1 <- setNames(d1, c("group", "count"))
d1$proportion = d1$count / length(group_1)
d1$time <- 1L
d2 <- as.data.frame(table(group_2))
d2 <- setNames(d2, c("group", "count"))
d2$proportion <- d2$count / length(group_2)
d2$time <- 2L
d <- rbind(d1, d2)
library(ggplot2)
library(ggalluvial)
gg <- ggplot(
d,
aes(x = time, y = proportion, stratum = group, alluvium = group)
) +
geom_stratum(aes(fill = group)) +
geom_alluvium(aes(fill = group), knot.pos = 0) +
geom_text(aes(label = round(proportion, digits = 2)), stat = "stratum")
plot(gg)
Straightforward, but not obvious! I didn't want to allow the user to input data in as wide a variety of forms as alluvial allows, since part of the spirit of ggplot2 is to standardize the input data.
Labeling the flows would be difficult even in theory, since, in general, if there are 4 groups at both axes then there will be 16 flows. In practice, it's currently impossible using ggalluvial, since stat_alluvium()
and stat_flow()
return the endpoints of the flows (which are located at the strata, within the axes) rather than the centers of the flows (which are between the axes). But i'll think about how this might be done in a future release. Thanks for the suggestion!
This looks great, and super straightforward compared to all the nonsense I had to cook up! Will definitely be using it in future. Thanks @corybrunson
Wow, this is incredibly simple @corybrunson. I second the sentiments of @jaspercooper - so many ugly hacks when the solution was there all along. Cheers!
@adomingues Your plot above is nice and exactly what I am looking for but have not idea how to make it. Could you please share the code and input data plot. in the mean time did you figure out easier way to generate it. Thank you
@singcell I can try to dig up the code, but I can't promise you anything until tomorrow. si you try the solution that @corybrunson posted? https://github.com/corybrunson/ggalluvial/issues/18#issuecomment-416064909
@singcell I can try to dig up the code, but I can't promise you anything until tomorrow. si you try the solution that @corybrunson posted? #18 (comment)
Thank you!
@singcell here it is:
p_line2 <- ggplot(all_perc, aes(x = Strain_lab, y = Perc, fill = Germline_expression)) +
geom_bar(stat = "identity", width = 0.1) +
geom_ribbon(aes(x = Strain_pos, ymin = ymin, ymax = ymax, fill = Germline_expression), alpha=.2) +
geom_text(data = new_legend, aes(x = x, y = y, label = Germline_expression, color = Germline_expression), show.legend = FALSE, fontface = 2, position=position_jitter(w=0.1)) +
scale_y_continuous(labels = scales::percent) +
scale_fill_prettier() +
scale_color_prettier() +
guides(fill=guide_legend(title="Germline expression")) +
labs(x = "",
y = "% of genes") +
theme_classic(base_size = 14)
p_line2
Sadly I can give you the data for a full reprex because I am not authorized to make the data public yet.
Thank you @adomingues. I appreciate your help.
Hi, really cool package, I'm excited to get using it in my working paper.
I've been hacking away at this all morning but can't seem to fix it: for some reason the flow width is constant in the following, any idea what I'm doing wrong? Apologies if this is obvious.
Data looks like this:
Code :
Gives: