corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

Blank space between strata for NAs #30

Closed elbamos closed 5 years ago

elbamos commented 5 years ago

I'm trying to plot responses to survey data over time use ggalluvial. My axes represent time; i.e., the first axis is responses at the start of the survey, the second axis is responses as month 3; etc. The alluviam show how people who gave one answer at a point in the study gave a different answer later in the study.

As is common with survey data, many people dropped out of the study at different times.

I am able to plot this with ggalluvial, by specifying na.rm=T to geom_stratum(). However, two visual problems arise.

  1. The stratum are represented as a stacked bar chart rising from the bottom. This makes it harder to see how the number of people giving each response changed over time. It also makes the alluvia harder to read. It would be better if each strata resided at the same place in the y axis on each of the axes. There would then be increasingly large blank spaces between the strata, as the axes progress from left to right.

  2. With geom_stratum(na.rm=T), the alluvia are filled in from the bottom rather than the top. So at the top of each strata, there is a part of the strata with no alluvia, representing persons who did not respond in the following time period. I would prefer if these blank areas were at the bottom rather than the top of each strata.

Is there a way to address one or both of these issues in ggalluvia?

corybrunson commented 5 years ago

Hi @elbamos,

  1. This is very similar to #28 and a common comment on the package. My short answer is that you can probably do this using geom_parallel_sets() from the ggforce package, at least to get spaces between the strata. (In general, it won't always be possible to align the strata horizontally without introducing arbitrarily wide padding between them, so i don't think either package implements that option.) My long answer is that, while the terminologies around Sankey and parallel sets diagrams are fraught (see #11), something that i think is essential to alluvial diagrams is that the alluvia and strata "settle" onto the horizontal axis, forming stacked bar plots at each diagram axis, so that the vertical axis has numerical meaning (the sum of the weights of the alluvia or strata). So, diagrams that violate this principle can't be rendered using ggalluvial.
  2. I can't be sure i'm addressing your issue without a reproducible example (try using a toy data set and the reprex package if this isn't the answer you want), but i'd suggest two things: First, make sure that na.rm is consistent between the stratum and alluvium layers, e.g. geom_stratum() and geom_alluvium(). (Also, try both options; they stack differently.) Second, try using scale_y_reverse() to invert the vertical axis.

I hope this helps, or at least clarifies!

corybrunson commented 5 years ago

Closing this issue to clear the queue, but can reopen if the problem has not been resolved.