Open Valentin-Bio opened 1 year ago
Hey Valentin, I am in a very similar situation, did you find a solution for this?
Hello @giacomomutti , I could not figure out how to make it.
bests.
Your nodes have a character names, so standard ggplot behaviour is to display these as categories in alphabetical order. If you notice at each x coordinate (or column if you prefer to think about it that way), the nodes are in alphabetical order (with A at the bottom to Z at the top, but with capital letters coming before lower case equivalents if we look at the order of TMED25 before Thioglobaceae). This ordering determines the node locations, which causes the overlaps to happen.
To control the order of character labels, you can convert them your node
and next_node
data columns to factor
objects and specify the ordering you want as the factor levels. They'll order themselves using this level-ordering rather than alphabetical ordering. Forcats may assist with handling factors.
However, I'm finding factors can mess up the sankey label positioning, which is why I'm browsing the issue board in the first place.
I solved this issue by converting the node
and next_node
column to factor
but the levels are all the names in your dataset.
First you need to arrange your dataset for all the columns you are interested in. Then you get the levels of all the columns and apply the same ordering to all the columns and the node and next_node variable and it should work. Then both the labels and the sankey will be correctly positioned.
This may not work if you have the same label for different taxonomic levels, in this case you can add a prefix to each clade like "cHaptophyta" and "fHaptophyta" so that they are unique and then remove the prefix, in this case that's the label
column.
df <- df %>%
arrange(phylum, class, order, family, genus, species, count)
lvls_tax <- c("Eukaryota",unique(c(unique(df$phylum), unique(df$class), unique(df$class),
unique(df$order), unique(df$family),unique(df$genus))))
df <- df %>%
mutate(phylum=factor(phylum, ordered = T, lvls_tax),
class=factor(class, ordered = T, lvls_tax),
order=factor(order, ordered = T, lvls_tax),
family=factor(family, ordered = T, lvls_tax),
genus=factor(genus, ordered = T, lvls_tax),
species=factor(species, ordered = T, lvls_tax))
df_long <- df %>%
make_long(colnames(df)[1:6], value = count) %>%
mutate(node=factor(node, lvls_tax), next_node=factor(next_node, lvls_tax),
label=gsub(".*_", "", node)) %>%
filter(!is.na(node))
ggplot(df_long, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = node, label=label)) +
geom_alluvial(space = 2, width = .3, flow.alpha = .6) +
geom_alluvial_label(size = 2.5, space = 2, color = 1, fill = "aliceblue") +
theme(legend.position = "none", axis.text.y = element_blank(),
axis.ticks.y = element_blank(), axis.title.x = element_blank(),
axis.text.x = element_text(angle=0, family = "Helvetica", colour = "black"))
This is the resulting plot:
Hope it helps!
Hello developer! ,
I'm using geom_sankey() to plot microbial taxonomies by given taxonomic ranks. This is what I did:
colnames(taxonomy_table)
tableforsankey <- taxonomy_table %>% make_long(Domain, Phylum, Class, Order, Family, Genus)
and this is the sankey that I get:
ribbons from phylum starts to intercross, is there a way in which I can display the sankey plot but specifying the ribbons to not cross over other ribbons ?
best regards,
Valentín.