Closed joe-jhou2 closed 4 years ago
Hi @mimisikai, thanks for raising the issue, and i think i have the solution. The code below first reproduces your example after reconstructing the data, then uses the aes.bind
parameter of stat_alluvium()
to rearrange the lodes within each stratum so that those with the same aesthetics (in this case only fill
) are adjacent, before they are rearranged according to the default rules. Is this the plot you wanted?
In case you're not familiar with ggplot2 internals: Whenever a layer is produced by a stat or a geom, parameters can be passed to either the stat or geom itself or the geom or stat (respectively) that it is paired with. geom_alluvium()
pairs with stat_alluvium()
by default, so, when the geom fails to recognize the aes.bind
parameter, it passes this parameter to the stat instead. The parameter is documented there, at help(stat_alluvium)
.
# default alluvium settings
ggplot(data = data,
aes(axis1 = TP1, axis2 = TP2, axis3 = TP3,
y = Freq)) +
scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) +
xlab("Time point") +
geom_alluvium(aes(fill = V)) +
geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE) +
theme_minimal()
# bind by aesthetics
ggplot(data = data,
aes(axis1 = TP1, axis2 = TP2, axis3 = TP3,
y = Freq)) +
scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) +
xlab("Time point") +
geom_alluvium(aes(fill = V), aes.bind = "alluvia") +
geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE) +
theme_minimal()
Created on 2020-04-02 by the reprex package (v0.3.0)
Thanks a lot! That's pretty awesome! I wanna to escalate this challenge: ideally, the each time point and stratum have their Freq data, like this
data TP1 TP2 TP3 V Freq_TP1 Freq_TP2 Freq_TP3 1 D1 D1D1 D1D1D1 IGHV1 10 12 5 2 D1 D1D1 D1D1D2 IGHV2 15 12 9 3 D1 D1D2 D1D1D1 IGHV1 31 3 16 4 D1 D1D2 D1D2D1 IGHV31 22 4 15 5 D1 D1D1 D1D2D1 IGHV2 2 15 16 6 D1 D1D2 D1D1D1 IGHV3 1 18 6 7 D1 D1D2 D1D1D2 IGHV3 111 19 12 8 D1 D1D2 D1D2D2 IGHV4 45 3 14 9 D1 D1D1 D1D1D1 IGHV1 67 3 20 10 D1 D1D1 D1D2D2 IGHV4 89 9 15 11 D1 D1D2 D1D2D2 IGHV5 23 3 14 12 D1 D1D2 D1D2D1 IGHV4 48 11 6 13 D1 D1D1 D1D1D1 IGHV2 90 7 4 14 D1 D1D2 D1D2D2 IGHV4 12 10 6 15 D1 D1D2 D1D2D1 IGHV5 46 8 16 16 D1 D1D1 D1D1D2 IGHV3 78 18 16 17 D1 D1D2 D1D2D2 IGHV48 90 12 7 18 D1 D1D1 D1D2D1 IGHV1 100 12 5 19 D1 D1D2 D1D1D2 IGHV4 0 13 15 20 D1 D1D1 D1D1D2 IGHV3 20 18 12
How can I arrange the data format and plot it?
Thx
On Thu, Apr 2, 2020 at 2:21 PM Cory Brunson notifications@github.com wrote:
Hi @mimisikai https://github.com/mimisikai, thanks for raising the issue, and i think i have the solution. The code below first reproduces your example after reconstructing the data, then uses the aes.bind parameter of stat_alluvium() to rearrange the lodes within each stratum so that those with the same aesthetics (in this case only fill) are adjacent, before they are rearranged according to the default rules. Is this the plot you wanted?
In case you're not familiar with ggplot2 internals: Whenever a layer is produced by a stat or a geom, parameters can be passed to either the stat or geom itself or the geom or stat (respectively) that it is paired with. geom_alluvium() pairs with stat_alluvium() by default, so, when the geom fails to recognize the aes.bind parameter, it passes this parameter to the stat instead. The parameter is documented there, at help(stat_alluvium).
default alluvium settings
ggplot(data = data, aes(axis1 = TP1, axis2 = TP2, axis3 = TP3, y = Freq)) + scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) + xlab("Time point") + geom_alluvium(aes(fill = V)) + geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE) + theme_minimal()
bind by aesthetics
ggplot(data = data, aes(axis1 = TP1, axis2 = TP2, axis3 = TP3, y = Freq)) + scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) + xlab("Time point") + geom_alluvium(aes(fill = V), aes.bind = "alluvia") + geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE) + theme_minimal()
Created on 2020-04-02 by the reprex package https://reprex.tidyverse.org (v0.3.0)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corybrunson/ggalluvial/issues/51#issuecomment-608096011, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFHYAYMGTI3HZLIVWOK7MDRKT6UTANCNFSM4L2ZPA6Q .
Aha, this is a separate issue, having to do (as you suspect) with the format of the data. These data are in alluvia form, which ggalluvial currently supports only with fixed frequencies. To allow changes in the y
value, you first need to restructure the data. When frequencies are fixed, this is straightforward, using these convenience functions. It took me a while to figure out how to extend the trick to variable frequencies in tidyr, so i'll flag this as a feature to incorporate into to_lodes_form()
before the first major release. Thanks!
data <- data.frame(
TP1 = c("D1", "D1", "D1", "D1",
"D1", "D1", "D1", "D1",
"D1", "D1", "D1", "D1",
"D1", "D1", "D1", "D1",
"D1", "D1", "D1", "D1"),
TP2 = c("D1D1", "D1D1", "D1D2", "D1D2",
"D1D1", "D1D2", "D1D2", "D1D2",
"D1D1", "D1D1", "D1D2", "D1D2",
"D1D1", "D1D2", "D1D2", "D1D1",
"D1D2", "D1D1", "D1D2", "D1D1"),
TP3 = c("D1D1D1", "D1D1D2", "D1D1D1", "D1D2D1",
"D1D2D1", "D1D1D1", "D1D1D2", "D1D2D2",
"D1D1D1", "D1D2D2", "D1D2D2", "D1D2D1",
"D1D1D1", "D1D2D2", "D1D2D1", "D1D1D2",
"D1D2D2", "D1D2D1", "D1D1D2", "D1D1D2"),
V = c("IGHV1", "IGHV2", "IGHV1", "IGHV31",
"IGHV2", "IGHV3", "IGHV3", "IGHV4",
"IGHV1", "IGHV4", "IGHV5", "IGHV4",
"IGHV2", "IGHV4", "IGHV5", "IGHV3",
"IGHV48", "IGHV1", "IGHV4", "IGHV3"),
Freq_TP1 = c(10, 15, 31, 22, 2, 1, 111, 45, 67, 89,
23, 48, 90, 12, 46, 78, 90, 100, 0, 20),
Freq_TP2 = c(12, 12, 3, 4, 15, 18, 19, 3, 3, 9,
3, 11, 7, 10, 8, 18, 12, 12, 13, 18),
Freq_TP3 = c(5, 9, 16, 15, 16, 6, 12, 14, 20, 15,
14, 6, 4, 6, 16, 16, 7, 5, 15, 12),
stringsAsFactors = FALSE
)
library(ggalluvial)
#> Loading required package: ggplot2
names(data)[1:3] <- paste("Seq_", names(data)[1:3], sep = "")
data$ID <- seq(nrow(data))
data <- tidyr::pivot_longer(data, c(Seq_TP1:Seq_TP3, Freq_TP1:Freq_TP3),
names_to = c(".value", "TP"),
names_sep = "_")
ggplot(data = data,
aes(x = TP, stratum = Seq, alluvium = ID,
y = Freq)) +
xlab("Time point") +
geom_alluvium(aes(fill = V), aes.bind = "alluvia") +
geom_stratum() + geom_text(stat = "stratum", aes(label = Seq))
Created on 2020-04-03 by the reprex package (v0.3.0)
Thanks Cory! Fantastic plot! A little bit suggestion on the cosmetic purpose in the new function, it will be really good if can separate D1D1, D1D2 et.al let them looks independence, also distribute evenly along the same time point axis.
You can find a real data example on https://martakolczynska.com/post/polpan-voting-alluvial-plots/ that uses https://github.com/mbojan/alluvial. Perhaps it is a nice example to show-off ggalluvial
too.
@mbojan it is a very cool example. I've noticed that political scientists have really taken to these diagram types.
@mimisikai to your point about the cosmetics, i think you're suggesting (in my terminology) the option of inserting gaps between the strata so that the stacks at each axis are the same height. Is that right? I've resisted that, since it would undermine the y
axis and would not make sense when applied to plots with negative strata.
I've been hand-wavey about this property in the past (see #11, #28, and #30), but i've written it up more carefully for a software paper that should be out soon. I'll post a link to it here, as i'd be grateful for both of your feedback.
@mbojan i just opened #54 with more specs on a new data set to include (to also showcase some features that are still in development). If it sounds like any source you know, i'd be very interested!
(Closing, as the original issue has been resolved.)
I've mock example like
TP1, TP2, TP3 are time points. Beside TP1, TP2 has two "segments", D1D1, D1D2, TP3 has four "segments" D1D1D1, D1D1D2, D1D2D1 and D1D2D2.
The plot I made like this:
ggplot(data = data, aes(axis1 = TP1, axis2 = TP2, axis3 = TP3, y = Freq)) + scale_x_discrete(limits = c("TP1", "TP2", "TP3"), expand = c(.1, .05)) + xlab("Time point") + geom_alluvium(aes(fill = V)) + geom_stratum() + geom_text(stat = "stratum", infer.label = TRUE) + theme_minimal()
What I desired, for example, TP1, only one segment D1 is there, I want it won't split too many substreams for the downstream. e.g. IGHV1 only shows once at TP1 and split into two for TP2(D1D1 and D1D2).