davidsjoberg / ggsankey

Make sankey, alluvial and sankey bump plots in ggplot
Other
262 stars 30 forks source link

Flows cross (unnecessarily) with custom factor order #16

Open MatthewHeun opened 2 years ago

MatthewHeun commented 2 years ago

Thanks for the great package!

Here is a simple data frame of 3 nodes and 2 flows.

  df <- data.frame(
    x = c(0, 0, 1, 1),
    next_x = c(1, 1, 2, 2),
    node = c("A", "A", "B", "C"),
    next_node = c("B", "C", NA, NA),
    value = c(1, 2, 1, 2)
  ) %>%
    dplyr::mutate(
      # This is the natural order and results in uncrossed flows, as expected..
      node = factor(node, levels = c("A", "B", "C"))
      # This is the unnatural order and results in unnecessarily crossed flows.
      # node = factor(node, levels = c("A", "C", "B"))
    )

It produces a fine ggsankey with:

  df %>%
    ggplot2::ggplot(mapping = ggplot2::aes(x = x, next_x = next_x, node = node, next_node = next_node, value = value,
                                           fill = node, label = node)) +
    ggsankey::geom_sankey(flow.alpha = 0.5, node.color = "gray30") +
    ggsankey::geom_sankey_label(size = 2, color = "white", fill = "gray40", show.legend = FALSE)

image

But let's say we want node "B" above node "C" on the right side of the diagram. We can set the factor levels differently:

  df <- data.frame(
    x = c(0, 0, 1, 1),
    next_x = c(1, 1, 2, 2),
    node = c("A", "A", "B", "C"),
    next_node = c("B", "C", NA, NA),
    value = c(1, 2, 1, 2)
  ) %>%
    dplyr::mutate(
      # This is the natural order and results in uncrossed flows, as expected..
      # node = factor(node, levels = c("A", "B", "C"))
      # This is the unnatural order and results in unnecessarily crossed flows.
      node = factor(node, levels = c("A", "C", "B"))
    )

To my eye, the resulting Sankey diagram has unnecessarily crossed flows out of node "A".

image

It would be more pleasing, visually, if the flow destined for node "B" departed from the top of node "A".

Is there a way to specify the North-South order of departure of the flows departing a node? Or could geom_sankey() automatically arrange the departure order by the North-South coordinates of the destinations?

JELAshford commented 2 years ago

Hi! Stumbled onto this looking for answers to my own Sankey flow woes. You can fix this unwanted crossing by also applying the factor re-ordering to the "next_node" column:

 df <- data.frame(
    x = c(0, 0, 1, 1),
    next_x = c(1, 1, 2, 2),
    node = c("A", "A", "B", "C"),
    next_node = c("B", "C", NA, NA),
    value = c(1, 2, 1, 2)
  ) %>%
    dplyr::mutate(
      # This is the unnatural order, applied to node and next_node
      node = factor(node, levels = c("A", "C", "B"))
      next_node = factor(next_node, levels = c("A", "C", "B"))
    )

which produces what I think you're after:

sankey_test

Hope this helps!