corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
499 stars 34 forks source link

Stratum Box Sizes not fitting data #24

Closed jvondollen closed 5 years ago

jvondollen commented 6 years ago

Description of the issue

The boxes representing the different stratum don't always fit the data. The colored lines going into/out of each stratum can take up much less space than the stratum labels/boxes, which look way too big for the data and/or contain "empty" lines in the box filling up space. Both of these issues happen in the below example.

Reproducible example (preferably using reprex::reprex())

set.seed(1234567890)
vaccinations_sampled <- vaccinations[sample(1:dim(vaccinations)[1], dim(vaccinations)[1]*0.75),]
ggplot(vaccinations_sampled,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq, fill = response, label = round(a, 3))) +
  geom_lode() + geom_flow() +
  geom_stratum(alpha = 0) +
  geom_text(stat = "stratum")
corybrunson commented 6 years ago

@jvondollen thanks for raising the issue! I've reproduced your example. I'll take a closer look ASAP. It does indeed look like a bug rather than a feature (empty lodes and strata are intended when some values are missing, which i don't think is the case here), possibly in the self-join step in geom_flow().

jvondollen commented 6 years ago

@corybrunson , I wasn't (still not) sure if the stratum "label box" vertical dimensions were supposed to be an accurate representation of the number of alluviums entering or leaving the box because of the results I was getting. I'm assuming the answer is yes?

corybrunson commented 6 years ago

@jvondollen yes, that is correct. If y is not provided, then each row of the data frame is counted as one unit; if y is specified (as in your example), then the height of each flow, lode, or stratum should be the sum of the values of y in the corresponding rows.

corybrunson commented 6 years ago

@jvondollen try re-installing from master and executing your example again. I resolved the issue on my end with commit 2cef2c033aaa1e99d4baf19312f4c3c8f5b186c1.

The problem was "simply" that the z-ordering solution from a couple of previous commits still wasn't functioning properly in cases where alluvia are "broken", in the sense that they exist on either side of, but not at, a middle axis.

(There's still some messy behavior with the not-run babynames example in the stat_alluvium() documentation, which i'll try to resolve before submitting this patch.)

corybrunson commented 6 years ago

Update: i've fixed in commit 56d194e5d7c537c321d9a4d0dd6adb856a482c5e the problem in geom_alluvium() (not stat_*()), which was an error in the calculation of the y-coordinate of the height of solitary lodes (without adjacent flows).