Closed qdread closed 3 years ago
Hi @qdread, and thanks for raising the issue. I saw your SO question about this and +1
ed it because i'm interested in learning the answer myself (and of course because it was clearly posed)!
To your specific question, i believe the only alluvium (or flow) coordinates that can be recovered using exported functions are the corners of the parallelogram shape that each flow would take if knot.pos
were set to 0
(also the y
midpoint), as in the second Titanic example in the main vignette. These can be obtained via the Stat*$compute_panel()
functions, as illustrated in the rectangle-ordering vignette.
I gather that you already have ways of extracting these coordinates and instead need to reproduce the curves. Is that right? I'm not sure what to suggest here. The non-xspline
curves are actually polygons; their basic shapes can be calculated using the unexported curve functions located in geom-utils.r, e.g. ggalluvial:::unit_sine()
, and they can be calculated from alluvial data using the unexported function data_to_unit_curve()
. As for the x-splines, would you just need the vectors that are passed to the parameters x
, y
, and shape
of grid::xsplineGrob()
? Those can similarly be calculated using the unexported function data_to_xspline()
. Either way, you'd have to pay close attention to how the coordinates are transformed along the way.
Sorry i can't be of more help than that!
Thanks for the response! I am still working on it and if you are interested I might be able to do a PR at some point if I come up with something that is generally useful. The info you provided will be a big help.
Do you imagine writing a vignette to demonstrate the app, or adding new functionality to install with the package? It sounds like the former, and i'd welcome it!
So I see that you saw the SO answer I posted ๐ If you ever get the chance, I would be interested to see if you could figure out a way to modify that script to extract the coordinates for the alluvia polygons when knot.pos > 0
.
I think that would be quite an exercise! It's not something i feel qualified to doโit would take a lot of time for me to figure it out. But here's where i would start, in the grid package:
XSPLINE primitive
section (line 768) in the file primitives.RgridXspline
definition (line 2315) in the file grid.cIn case the task is urgent, note that the non-xspline curves are calculated and rendered in ggalluvial as polygons, and should not be much harder than the parallelograms to work with in an app. The unexported functions can be called directly using the triple-colon operator: ggalluvial:::data_to_xspline()
and ggalluvial:::data_to_unit_curve()
โi mention this again just in case, since the app code at SO instead took some extra code and (i assume) effort to re-define the data-to-spline function.
Please do update this thread with future progress or problems!
Hi again,
I got pretty close to the solution using the function grid::xsplinePoints()
. However I am still seeing a bit of mismatch between the polygons drawn with geom_alluvium
and the reverse engineered polygons I created. Can you check this reprex and see if it is just some mismatch in parameters or a mistake in the way I extracted the coordinates?
You can see in the image that the red bordered polygons (extracted coordinates) do not match the blue bordered ones exactly (the polygons drawn by ggalluvial).
library(tidyverse)
library(ggalluvial)
library(grid)
example_data <- data.frame(weight = rep(1, 10),
ID = 1:10,
cluster = rep(c(1,2), 5),
grp1 = rep(c('1a','1b'), c(6,4)),
grp2 = rep(c('2a','2b','2a'), c(3,4,3)),
grp3 = rep(c('3a','3b'), c(5,5)))
# Create plot with example data
p <- ggplot(example_data, aes(y = weight, axis1 = grp1, axis2 = grp2, axis3 = grp3)) +
geom_alluvium(aes(fill = factor(cluster)), color = 'blue', lwd = 0.3, knot.pos = 0.25) + # color for connections
geom_stratum(width = 1/8, reverse = TRUE) + # plot the boxes over the connections
geom_text(aes(label = after_stat(stratum)),
stat = "stratum",
reverse = TRUE,
size = rel(1.5)) + # plot the text
theme_bw() + # black and white theme
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0))
# Build plot for reverse engineering of polygon coordinates
pbuilt <- ggplot_build(p)
data_draw <- transform(pbuilt$data[[1]], width = 1/3)
draw_by_group <- function(dat) {
first_row <- dat[1, setdiff(names(dat),
c("x", "xmin", "xmax",
"width", "knot.pos",
"y", "ymin", "ymax")),
drop = FALSE]
rownames(first_row) <- NULL
curve_data <- ggalluvial:::data_to_xspline(dat, knot.prop = TRUE)
data.frame(first_row, curve_data)
}
# Get spline coordinates from each polygon in the built plot
groups_to_draw <- lapply(1:10, function(i) data_draw[data_draw$group == i,])
polygon_coords <- lapply(groups_to_draw, draw_by_group)
xsplines <- map(polygon_coords, ~ xsplineGrob(x=.$x, y=.$y, shape=.$shape, open=TRUE))
# Use grid::xsplinePoints to draw the curve for each polygon
xxpts <- map(xsplines, xsplinePoints)
# Combine into a data frame for diagnostic plotting
xxptsdf <- imap_dfr(xxpts, function(pts, id) data.frame(id=id, x=as.numeric(pts$x), y=as.numeric(pts$y)))
# We have to now figure out the conversion factor to get the grid graphics back to the actual graphics.
node_width<- 1/8
yrange <- c(0, nrow(example_data)) # Number of axes in example data
xrange <- c(1 - node_width/2, 3 + node_width/2) # From 1 to number of strata, adjusted by half the node width
# Function to convert grid graphics coordinates to data coordinates
new_range_transform <- function(x_old, min_x_new, max_x_new) {
(x_old - min(x_old))/(max(x_old) - min(x_old)) * (max_x_new - min_x_new) + min_x_new
}
xxptsdf$x_plotcoords <- new_range_transform(xxptsdf$x, xrange[1], xrange[2])
xxptsdf$y_plotcoords <- new_range_transform(xxptsdf$y, yrange[1], yrange[2])
p + geom_polygon(data = xxptsdf %>% mutate(group=factor(id)), aes(x=x_plotcoords,y=y_plotcoords,group=group,fill=group),color='red',inherit.aes=F,alpha=0.4)
@qdread this looks very good as a testing ground. I'll take a close look later this week and hopefully have some insight to share.
I looked at it a bit more and maybe the only thing that needs to be changed is the xrange
is a little off. If you mess with the extra padding on the min and max, it gets it closer. But I just didn't know the way to calculate the padding exactly.
Sorry for spamming ... I figured it out. Node width in geom_alluvium
defaults to 1/3. The width I was using was 1/8 which is actually the width of the geom_stratum
, not the width of the flat portion between the curves. So that caused things not to match up. If you just use node_width = 1/3
above, you get a nice match.
One of these days I'll work on the vignette ๐
Excellent! Let me know again if i might be helpful in future.
Hi again, I've written up a draft of the vignette! I need to polish it a little bit but once I do, I will make a PR, hopefully I will get that done tomorrow or sometime next week. You might need to look over it to make sure I used all the correct ggalluvial terminology, if you don't mind.
Of course, not a problem. I look forward to seeing it!
I've made some subtle changes and updated the website (your vignette is here). I couldn't get the images to center both in the vignette when rendered locally and in the pkgdown website, so i'd be glad to know if you figure out a way! Anyway, i'll submit the next release to CRAN tonight or tomorrow, unless you see something amiss there.
This seems to have been resolved, and remaining issues with Shiny integration will play out in new issues and pull requests.
Hi Cory, thanks for your amazing package. I am developing a Shiny app that uses ggalluvial to create an interactive Sankey diagram. I would like to display information in a window when the user hovers their mouse over the plot. I have been able to use the object that is generated from
ggplot_build()
to find the coordinates of each component ofgeom_stratum()
which I have used to display information about the strata when the user mouses over them.What I would like to do is easily be able to extract the x,y coordinates associated with the polygons that are plotted for
geom_alluvium
so that I can display information about each of the links. That's a little more difficult because of the splines. I was thinking of reverse engineering the coordinates by recalculating the spline coordinates myself manually, but is there an existing way to extract that information from the plot object?