corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

Could ggalluvial assign different color for each node? #57

Closed Kailewang closed 1 year ago

Kailewang commented 4 years ago

Hi, This is pretty nice package! I am wondering if it is possible to assign color for each node like Sankey plot (https://www.r-graph-gallery.com/sankey-diagram.html)? Thanks.

corybrunson commented 4 years ago

Thank you!

This question might have two answers.

  1. Yes, the strata (as this package refers to the nodes) can be colored, as illustrated in the last two examples in this vignette and in a few of the examples in the documentation.
  2. But, they can not be colored according to a different scheme than the alluvia or flows (i.e. the ribbons), because ggplot2 only accepts one specification to the fill aesthetic parameter (which controls the interior colors of the graphical elements).

Does that help? To expand on (2), i know there have been discussions about allowing different scales for different graphical elements, but i don't think it's been implemented in ggplot2—though there might be an extension package that accomplishes this.

Kailewang commented 4 years ago

Thanks! I agree with you. It works well when you have repeat elements between strata, but my data is kind of different. Not sure if its possible to add an extra column for the color, or add extra layer on the plot to only control the color. Just an idea. I tried NetworkSankey, it couldn't be saved as a good static and editable image as ggplot. I also googled lots of other Sankey, alluvial, riverplot, looks none of them could output satisfied ggplot figures.

corybrunson commented 4 years ago

If you're wanting to control the palette of colors used in the plot, are you familiar with the scale_fill_*() functions? They can be added to a ggplot object in the same way as layers (via +). Several are illustrated here—along with scale_colour_*()s that have the same options.

kin182 commented 4 years ago

I have a follow up question on assigning colors.

My figure looks like the titanic example in which there are different categories on axes 1, 2 and 3. Is it possible to assign color manually to the different categories of Sex and Survived axes? For example, blue color for Male and red for female, etc.

image

Thanks so much!

corybrunson commented 4 years ago

@kin182 i think this is a ggplot2 issue rather than a ggalluvial issue. Check out the very last example in the documentation on color-related aesthetics, which uses scale_fill_manual() to control the colors used to fill the rectangles. A similar trick should work here—taking care that the order of the colors passed to values reflects the order of the values of the variable passed to the fill aesthetic.

kin182 commented 4 years ago

I tried scale_fill_manual() and I was able to set colors for axis1 but axis2 still had no colors. How to specifically set colors for axes2 and 3?

corybrunson commented 4 years ago

@kin182 could you share the code you're using in a reproducible example on a small data set?

kin182 commented 4 years ago

Yes, using the titanic_wide as an example,

titanic_wide <- data.frame(Titanic)

ggplot(data = titanic_wide, aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)) + scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.2, .05)) + xlab("Demographic") + geom_alluvium(aes(fill=Class)) + geom_stratum(aes(fill=Class)) + geom_text(stat = "stratum", infer.label = TRUE) + theme_minimal() + ggtitle("passengers on the maiden voyage of the Titanic", "stratified by demographics and survival")+ scale_fill_manual(values=c("red","orange","green","blue"), breaks=c("1st","2nd","3rd","Crew"), labels=c("1st","2nd","3rd","Crew"))

How can we set custom colors for the columns of axis2 and axis3? Thanks!

corybrunson commented 4 years ago

Aha—i think i understand. There are limitations with using wide data. titanic_wide has one variable for Class, a different one for Sex and a third one for Age, and only one of them can be passed to the fill aesthetic. So there's no natural way to color-code the strata at each axis, since the axes are based on different variables.

This is the sort of situation that to_lodes_form() is designed for. Here's an attempt using your setup:

library(ggalluvial)
#> Loading required package: ggplot2
# wide data
titanic_wide <- data.frame(Titanic)
# current plot
ggplot(data = titanic_wide, aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.2, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill=Class)) +
  geom_stratum(aes(fill=Class)) +
  geom_text(stat = "stratum", infer.label = TRUE) +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic", "stratified by demographics and survival")+
  scale_fill_manual(values=c("red","orange","green","blue"), breaks=c("1st","2nd","3rd","Crew"), labels=c("1st","2nd","3rd","Crew"))

# long data
titanic_long <- to_lodes_form(data.frame(Titanic),
                              key = "Demographic", value = "Group", id = "Cohort",
                              axes = 1:3)
# plot with all strata colored
ggplot(data = titanic_long,
       aes(x = Demographic, stratum = Group, alluvium = Cohort, y = Freq)) +
  geom_alluvium(aes(fill=Group)) +
  geom_stratum(aes(fill=Group)) +
  geom_text(stat = "stratum", aes(label = Group)) +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic", "stratified by demographics and survival")
#> Warning in f(...): Some differentiation aesthetics vary within alluvia, and will be diffused by their first value.
#> Consider using `geom_flow()` instead.

Created on 2020-05-18 by the reprex package (v0.3.0)

I dropped the manual color scale because it did not include enough colors, but if you know what colors you want to use then you should be able to replace it. Note also that each alluvium (cohort) is colored only one color, since the underlying graphical objects cannot change color. (This is what the warning is about.)

kin182 commented 4 years ago

Wonderful! This is what I was looking for. Thanks also for cleaning up my code!

Uniocrassus commented 2 years ago

Thanks for sharing your code! It's been very helpful for my own purpose.

I do have one issue that arises that I can't seem to square, even when using the titanic_wide dataset.

I actually like the fact that Sex and Age strata remain transparent. However, in my case, it seems to be turning it a solid gray. Any suggestions on how to have it remain empty?

corybrunson commented 2 years ago

@Uniocrassus i'm glad to hear it!

Are you using the same code as above and getting different results? Or have you written your own code? If you've written your own code, then it would be most helpful for you to post it here. If you're getting different results using the code above, then it may be due to changes in default behavior since this issue was originally resolved. I can take a closer look once i know more.

Uniocrassus commented 2 years ago

@corybrunson Thanks for getting back to me so quickly!

I'm using exactly the same code as above, the code I had written for it gave me a far less visually appealing output.

The only difference between my dataset and the titanic_wide dataset is the number of categories per collum, and I don't see how that should cause the issues I'm seeing (3, 6, 9 in mine vs 4, 2, 2 in titanic). My gut tells me that there is an extremely basic difference in default behavior, but that's about as far as my limited knowledge can take me.

Edit: exported chart displaying the issue Rplot

corybrunson commented 2 years ago

@Uniocrassus this might have to do with an earlier version change (see the NEWS file) in which i changed the default aesthetic for missing values to "transparent" (though it seems like a backward explanation). Anyway, by specifying that missing Class values should be treated as missing (the specification na.value = NA in scale_fill_manual()), the following code reproduces the desired result above on my machine:

library(ggalluvial)
#> Loading required package: ggplot2
# wide data
titanic_wide <- data.frame(Titanic)
# current plot
ggplot(
  data = titanic_wide,
  aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)
) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.2, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill=Class)) +
  geom_stratum(aes(fill=Class)) +
  geom_text(stat = "stratum", infer.label = TRUE) +
  theme_minimal() +
  ggtitle(
    "passengers on the maiden voyage of the Titanic",
    "stratified by demographics and survival"
  )+
  scale_fill_manual(
    values=c("red","orange","green","blue"),
    breaks=c("1st","2nd","3rd","Crew"),
    labels=c("1st","2nd","3rd","Crew"),
    na.value = NA
  )
#> Warning: The parameter `infer.label` is deprecated.
#> Use `aes(label = after_stat(stratum))`.

Created on 2021-12-01 by the reprex package (v2.0.1)

Please let me know if it also works for you! You can then see if it works with your own data.

corybrunson commented 2 years ago

Reopening this issue because the solution above seems like it's only telling ggplot2 to do what it's already supposed to do, and a better explanation is needed.

Uniocrassus commented 2 years ago

@corybrunson Your amendment to the code does fix the issue, but only partially. I believe it to be fixing the error from the wrong position. In instances where multiple 'classes' combine into one 'sex' to use titanic_wide terminology (my case 'study focus' and 'mussel genus') the box goes transparent, but when only one 'class' enters one 'sex', the particular selection goes opaque to that color (see figure).

I can't seem to find a way to specifically define the aesthetics of different strata. Is there even a way to do that within ggplot2, or would I need to move to the alluvial package?

Rplot

corybrunson commented 2 years ago

@Uniocrassus it looks like some graphical elements are being duplicated, which (in my experience) usually follows from duplication in the underlying data. Would you be able to share either your data set or a small toy data set that reproduces the problem?

Uniocrassus commented 2 years ago

@corybrunson Here's the dataset that produced this plot as a .txt file. My inkling is that one of the packages is interpreting the presence of a study focus in a mussel genus as being equivalent to that study focus, while the presence of multiple is interpreted as a na.value.

Thanks again and in advance for taking the time!

sharable alluvion dataset.txt .

corybrunson commented 2 years ago

@Uniocrassus thanks for pursuing this. I see what you mean, and i've been able to reproduce it using the Titanic data set (below). Whenever all observations that make up a stratum take the same value of some aesthetic, then that aesthetic is attributed to the stratum; otherwise, the aesthetic is considered missing.

This behavior seems like it would be desirable in some cases while undesirable in others. So perhaps the ideal solution would be for geom_stratum() to have an additional logical parameter, say axis.only, that would control whether an aesthetic passed to an axis variable (when the data are in wide form) is inherited by strata on other axes. Does that sound right to you? It would not allow a user to treat different aesthetics (like fill and alpha) differently, but it would be a start.

In the meantime, there are a couple of ways you can work around the problem (also below). One keeps the data as-is and passes the computed stratum variable to the fill aesthetic, though this requires "cancelling" the remaining values of stratum via scale_fill_manual(). The other adds zero rows to the data, which causes stat_stratum() to interpret off-axis strata as heterogeneous and therefore to assign them a missing aesthetic.

library(ggalluvial)
#> Loading required package: ggplot2
# missing rows: homogeneous strata inherit aesthetics
titanic_wide <- data.frame(Titanic)
titanic_wide <-
  titanic_wide[titanic_wide$Class != "Crew" | titanic_wide$Sex != "Female", ]
ggplot(data = titanic_wide,
       aes(axis1 = Class, axis2 = Sex, axis3 = Age,
           y = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age")) +
  xlab("Demographic") +
  geom_alluvium(aes(fill = Sex)) +
  geom_stratum(aes(fill = Sex)) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_fill_discrete(na.value = NA)

# missing rows: homogeneous strata do not inherit calculated aesthetics
ggplot(data = titanic_wide,
       aes(axis1 = Class, axis2 = Sex, axis3 = Age,
           y = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age")) +
  xlab("Demographic") +
  geom_alluvium(aes(fill = Sex)) +
  geom_stratum(aes(fill = after_stat(stratum))) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_fill_manual(values = c(Male = "turquoise", Female = "orange"),
                    na.value = NA)

# zero rows: strata are treated as heterogeneous
titanic_wide <- data.frame(Titanic)
titanic_wide[titanic_wide$Class == "Crew" & titanic_wide$Sex == "Female",
             "Freq"] <- 0L
ggplot(data = titanic_wide,
       aes(axis1 = Class, axis2 = Sex, axis3 = Age,
           y = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age")) +
  xlab("Demographic") +
  geom_alluvium(aes(fill = Sex)) +
  geom_stratum(aes(fill = Sex)) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_fill_discrete(na.value = NA)

Created on 2021-12-03 by the reprex package (v2.0.1)

Uniocrassus commented 2 years ago

@corybrunson Thanks for the help! The first of the two options suits my application the most conveniently of the two, so I'll roll on with that one. Thanks for the help! <3

willizhang commented 1 year ago

@Uniocrassus this might have to do with an earlier version change (see the NEWS file) in which i changed the default aesthetic for missing values to "transparent" (though it seems like a backward explanation). Anyway, by specifying that missing Class values should be treated as missing (the specification na.value = NA in scale_fill_manual()), the following code reproduces the desired result above on my machine:

library(ggalluvial)
#> Loading required package: ggplot2
# wide data
titanic_wide <- data.frame(Titanic)
# current plot
ggplot(
  data = titanic_wide,
  aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)
) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.2, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill=Class)) +
  geom_stratum(aes(fill=Class)) +
  geom_text(stat = "stratum", infer.label = TRUE) +
  theme_minimal() +
  ggtitle(
    "passengers on the maiden voyage of the Titanic",
    "stratified by demographics and survival"
  )+
  scale_fill_manual(
    values=c("red","orange","green","blue"),
    breaks=c("1st","2nd","3rd","Crew"),
    labels=c("1st","2nd","3rd","Crew"),
    na.value = NA
  )
#> Warning: The parameter `infer.label` is deprecated.
#> Use `aes(label = after_stat(stratum))`.

Created on 2021-12-01 by the reprex package (v2.0.1)

Please let me know if it also works for you! You can then see if it works with your own data.

Thank you very much for this detailed instructions. Would it be possible to change the transparency for the Class axis (but not showing the split between the different alluvia)?

corybrunson commented 1 year ago

Hi @zhangguoqianggu , does the slight change below work for you? I just made the stratum fill transparent, so that the half-transparent alluvia are visible through them. If not, could you explain in more detail what you want?

library(ggalluvial)
#> Warning: package 'ggalluvial' was built under R version 4.1.2
#> Loading required package: ggplot2
# wide data
titanic_wide <- data.frame(Titanic)
# current plot
ggplot(
  data = titanic_wide,
  aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)
) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.2, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill=Class)) +
  # plot empty strata
  geom_stratum(alpha = 0) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  theme_minimal() +
  ggtitle(
    "passengers on the maiden voyage of the Titanic",
    "stratified by demographics and survival"
  )+
  scale_fill_manual(
    values=c("red","orange","green","blue"),
    breaks=c("1st","2nd","3rd","Crew"),
    labels=c("1st","2nd","3rd","Crew"),
    na.value = NA
  )

Created on 2023-05-10 with reprex v2.0.2

willizhang commented 1 year ago
alpha = 0

Hi Thank you for your help! I was wondering if it would be possible to adjust the transparency of the colours for "Class" strata without showing the split between the ribbons. For example, if I change geom_stratum(aes(fill=Class)) to geom_stratum(aes(fill=Class), alpha = 0.5), the gaps between the ribbons will be obvious in the column "Class" (as the new graph you have shown above: in "Crew", you can see some split indicated by white space).

The reasoning behind my request is that: sometimes, the default colours for the strata can be quite different compared to the ribbon (they are in the same color but quite different transparency); if the transparency of the colours for the strata can be adjusted (without showing the split between ribbons), the graphs would look nicer. :D

corybrunson commented 1 year ago

OK, i think i understand. By using the alluvium geom, you're telling the plot that you want separate graphical objects for the alluvia, which results in the splits. One option is to combine equivalent alluvia, which will remove some but not all. To do this, set cement.alluvia = TRUE in the geom_alluvium() call.

The other is to only plot the flows (between the strata) of the alluvia, not the lodes (overlapping the strata), and then separately color the strata. The problem with this is that it will no longer plot the lodes at the other strata. Here's an example:

library(ggalluvial)
#> Loading required package: ggplot2
titanic_wide <- data.frame(Titanic)
ggplot(
  data = titanic_wide,
  aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)
) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.2, .05)) +
  xlab("Demographic") +
  # keep using alluvium stat, but combine it with the flow geom
  geom_flow(stat = "alluvium", aes(fill=Class)) +
  # color strata
  geom_stratum(aes(fill=Class), alpha = .5) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  theme_minimal() +
  ggtitle(
    "passengers on the maiden voyage of the Titanic",
    "stratified by demographics and survival"
  )+
  scale_fill_manual(
    values=c("red","orange","green","blue"),
    breaks=c("1st","2nd","3rd","Crew"),
    labels=c("1st","2nd","3rd","Crew"),
    na.value = NA
  )

Created on 2023-05-14 with reprex v2.0.2

If you want to plot the lodes over the remaining strata, then you'll need to put the data in lodes form, then maybe create a logical variable that is TRUE for lodes at the second and third axes but FALSE for lodes at the first axis. You can then use this variable in the aesthetics for a geom_lode() layer that also uses the alluvium stat. Let me know if that works, or if you have trouble trying it, and i can take a shot in the next few days.

willizhang commented 1 year ago

Thank you very much again for your help! I found the one graph you previously suggested add na.value = NA in scale_fill_manual() is good looking enough (https://github.com/corybrunson/ggalluvial/issues/57#issuecomment-983834240). :D

corybrunson commented 1 year ago

I think the explanation above belongs in {ggplot2} rather than in {ggalluvial}, so i'm closing this issue again, but feel free to reopen it to challenge that view!