Closed kaseyzapatka closed 3 years ago
Hi @kaseyzapatka, thanks for raising the issue. I would take the following steps to try to resolve it:
names
and set up the plot using only rank2000_2019
. This will ensure that whatever data transformations or plotting parameters you use will result in consistent behavior (unless you encounter a bug!).stat_*()
documentation, e.g. stat_stratum()
, describes how to set them globally. (To locate higher-rank metro areas at greater vertical positions, i think you need to set decreasing = FALSE
. Since the strata are treated categorically, you might also have to convert the axis variables rank00
and rank19
to factors in the correct numerical order.)If these don't work, please let me know and i'll take a closer look.
hmm... I'm still having problems.
I think it's better if the data are organized in lodes (long) form, so I converted them manually (new data is here).
Here's my code and the figure I have now:
# plot
alluvial <- rank2000_2019_lodes %>%
mutate(CBSA_name = as_factor(CBSA_name)) %>%
mutate(year = as.numeric(year)) %>%
ggplot(data = .,
aes(x = as_factor(year),
stratum = forcats::fct_reorder(CBSA_name, rank),
alluvium = rank)) +
theme_void() +
geom_flow(aes(fill = class), width = .5, alpha = 0.7) +
geom_lode(aes(fill = forcats::fct_reorder(class, year)), width = .5, alpha = 0.7) +
# labels
stat_stratum(geom = "text", aes(label = forcats::fct_reorder(CBSA_name, year), order = rank, decreasing = FALSE)) +
scale_fill_manual(values = c("down" = "#D2413C", "up" = "#3F5941", "NA" = "white"))
alluvial
#> Ignoring unknown aesthetics: order, decreasing
Switching to lodes enabled more control over ordering, so now they are ordered correct; except that both 2000 (left-hand side) and 2019 (right-hand side) have the same label order when they shouldn't. The whole point of this sankey is to show how metros have changed in rank between years. So for example, Dallas moves from 5th (in 2000) to 4th (in 2019). You can see it in the correct place on the right hand side (2019 axis) but not on the left (2000). I think I need to specify some filter to order the labels by rank instead of assigning both year labels for both axes. order
and decreasing
don't seem to be recognized and didn't work.
The second problem is the coloring (flow) from 2000 to 2019. All green should be going up, while all red should be going down. I imagine this will be corrected when the first problem is fixed?
Thanks for you help again. I really appreciate it.
Best, Kasey
Oh, i should have said previously: Unless all other avenues have been exhausted, don't use variable transformations within a plot layer. I expect the fct_reorder()
calls are causing the mismatches between the flows, 2019 lodes, and labels. Same for as_factor(year)
. Even if the plot doesn't look quite right, it is almost always best to begin with a plot that is consistent and then gradually make aesthetic changes to it. So see if you can do all of the data transformations first, then create the plot using the same aesthetics for every layer. That includes order
: Passing rank
to that aesthetic was, i think, the right call, but it needs to be done either in ggplot(aes())
or else in every plot layer. Keeping the aesthetics consistent is similar to keeping the options consistent (see previous comment) in that it must be done either once upstream or everywhere downstream.
I don't have time tonight, but if you still can't get it looking the way you want, then i can tinker with it myself this weekend! Glad to be of help where i can, and also gratified that you're making good use of the extension. : )
@corybrunson, so I moved all the transformations to before the plotting begins like you suggested and now pass order
to the ggplot(aes())
; however, I'm left with a slightly worse plot because the labels are even more messed up now.
alluvial <- rank2000_2019_lodes %>%
mutate(CBSA_name = as_factor(CBSA_name)) %>%
ggplot(data = .,
aes(x = year,
stratum = CBSA_name,
alluvium = rank,
order = rank,
decreasing = TRUE)) +
theme_classic() +
geom_flow(aes(fill = class), width = .4, alpha = 0.7) +
geom_lode(aes(fill = class), width = .4, alpha = 0.7) +
# labels
stat_stratum(geom = "text", aes(label = CBSA_name), decreasing = TRUE) +
scale_fill_manual(values = c("down" = "#D2413C", "up" = "#3F5941", "NA" = "white"))
alluvial
The test for when the map is correct : Philly should be in the 4th position in 2000 and move to the 8th position in 2019 while Dallas should move from the 5th position in 2000 to the 4th in 2019. The flows were correct in the previous post's figure but are not out of order along with the labels.
Thanks for your help, I'd much appreciate it if you could look at it over the weekend. I'm a little exasperated at this point.
Best, Kasey
Sure, i'll be glad to try it out myself this weekend! I see a few remaining problems but i'm not completely sure that resolving them will be the end of the story.
Hi @kaseyzapatka, i thought more carefully about your data, and i think the code below generates the plot you want:
ggplot(rank2000_2019_lodes,
aes(x = year, stratum = year, alluvium = CBSA_name, order = rank)) +
geom_alluvium(aes(fill = class), width = .4, alpha = .7) +
stat_alluvium(geom = "text", aes(label = CBSA_name)) +
scale_fill_manual(values = c(down = "#D2413C", up = "#3F5941", `NA` = "white"))
Does it work for you?
For reference, here's how i arrived at it (refer to the ordering of the rectangles vignette for more detail on specific steps):
alluvium
is specified, then its value is internally passed to stratum
. To avoid this, i specified stratum
to year
, simply because it forced the plot to use only one stratum on each axis.decreasing
only apply to strata, so they are no longer appropriate. To order the lodes within each stratum, i specified the aesthetic order = rank
. (To reverse the vertical positions, you could specify order = -rank
.)@corybrunson, this worked perfectly. Thanks for all your help and the detailed explanation too. I guess I assumed CBSA_name were the "strata" where every element was part of its own "strata", but now that you mention it, it seems obvious that there are no "strata". I think the rest of the code makes sense now that the strata are out of the equation.
Thanks again, really appreciate it. I think I'll use several different iterations of these alluvial plots in my work. Will make sure to cite appropriately!
Posting the final plot for posterity:
Hi @corybrunson,
Thanks so much for this wonderful package. It's so well documented.
I'm having a bit of trouble matching strata names to their correction position on the alluvial plot and was wondering if you could point me in the right direction.
I'm using
ggalluvial
to visualize changes in rank of the top 100 metro areas by population over time from 2000 to 2019. Here is glimpse of the data structure, the code used to produce the plot below, and a link to the data.The main problem is that the strata labels aren't in their correction positions. You can see from their rank that New York metro area was the largest in 2000 and 2019 and it didn't change position. On the plot, it is at the bottom, when is should be at the top, and is shown that it's position changed. I'm not 100% sure inverting them will solve this issue, but I suspect that might since one of the smallest metros is at the top.
I can't figure out how to reference CBSA_name to plot it correctly (I gather the data is being transformed under the hood?) so I created another object (names) that is just a dataframe of CBSA_name and rank and referenced it in geom_text()but that didn't work. Next, I tried re-ordering using desc() but that didn't work either.
Any thoughts on how to locate the strata names in their correct positions? Thanks so much.
Here's the plot so far: