corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

Question: output of coordinates for integration with Shiny? #68

Closed qdread closed 3 years ago

qdread commented 3 years ago

Hi Cory, thanks for your amazing package. I am developing a Shiny app that uses ggalluvial to create an interactive Sankey diagram. I would like to display information in a window when the user hovers their mouse over the plot. I have been able to use the object that is generated from ggplot_build() to find the coordinates of each component of geom_stratum() which I have used to display information about the strata when the user mouses over them.

What I would like to do is easily be able to extract the x,y coordinates associated with the polygons that are plotted for geom_alluvium so that I can display information about each of the links. That's a little more difficult because of the splines. I was thinking of reverse engineering the coordinates by recalculating the spline coordinates myself manually, but is there an existing way to extract that information from the plot object?

corybrunson commented 3 years ago

Hi @qdread, and thanks for raising the issue. I saw your SO question about this and +1ed it because i'm interested in learning the answer myself (and of course because it was clearly posed)!

To your specific question, i believe the only alluvium (or flow) coordinates that can be recovered using exported functions are the corners of the parallelogram shape that each flow would take if knot.pos were set to 0 (also the y midpoint), as in the second Titanic example in the main vignette. These can be obtained via the Stat*$compute_panel() functions, as illustrated in the rectangle-ordering vignette.

I gather that you already have ways of extracting these coordinates and instead need to reproduce the curves. Is that right? I'm not sure what to suggest here. The non-xspline curves are actually polygons; their basic shapes can be calculated using the unexported curve functions located in geom-utils.r, e.g. ggalluvial:::unit_sine(), and they can be calculated from alluvial data using the unexported function data_to_unit_curve(). As for the x-splines, would you just need the vectors that are passed to the parameters x, y, and shape of grid::xsplineGrob()? Those can similarly be calculated using the unexported function data_to_xspline(). Either way, you'd have to pay close attention to how the coordinates are transformed along the way.

Sorry i can't be of more help than that!

qdread commented 3 years ago

Thanks for the response! I am still working on it and if you are interested I might be able to do a PR at some point if I come up with something that is generally useful. The info you provided will be a big help.

corybrunson commented 3 years ago

Do you imagine writing a vignette to demonstrate the app, or adding new functionality to install with the package? It sounds like the former, and i'd welcome it!

qdread commented 3 years ago

So I see that you saw the SO answer I posted ๐Ÿ˜€ If you ever get the chance, I would be interested to see if you could figure out a way to modify that script to extract the coordinates for the alluvia polygons when knot.pos > 0.

corybrunson commented 3 years ago

I think that would be quite an exercise! It's not something i feel qualified to doโ€”it would take a lot of time for me to figure it out. But here's where i would start, in the grid package:

In case the task is urgent, note that the non-xspline curves are calculated and rendered in ggalluvial as polygons, and should not be much harder than the parallelograms to work with in an app. The unexported functions can be called directly using the triple-colon operator: ggalluvial:::data_to_xspline() and ggalluvial:::data_to_unit_curve()โ€”i mention this again just in case, since the app code at SO instead took some extra code and (i assume) effort to re-define the data-to-spline function.

Please do update this thread with future progress or problems!

qdread commented 3 years ago

Hi again,

I got pretty close to the solution using the function grid::xsplinePoints(). However I am still seeing a bit of mismatch between the polygons drawn with geom_alluvium and the reverse engineered polygons I created. Can you check this reprex and see if it is just some mismatch in parameters or a mistake in the way I extracted the coordinates?

Output

You can see in the image that the red bordered polygons (extracted coordinates) do not match the blue bordered ones exactly (the polygons drawn by ggalluvial).

image

Code

library(tidyverse)
library(ggalluvial)
library(grid)

example_data <- data.frame(weight = rep(1, 10),
                           ID = 1:10,
                           cluster = rep(c(1,2), 5),
                           grp1 = rep(c('1a','1b'), c(6,4)),
                           grp2 = rep(c('2a','2b','2a'), c(3,4,3)),
                           grp3 = rep(c('3a','3b'), c(5,5)))

# Create plot with example data
p <- ggplot(example_data, aes(y = weight, axis1 = grp1, axis2 = grp2, axis3 = grp3)) + 
  geom_alluvium(aes(fill = factor(cluster)), color = 'blue', lwd = 0.3, knot.pos = 0.25) + # color for connections
  geom_stratum(width = 1/8, reverse = TRUE) + # plot the boxes over the connections
  geom_text(aes(label = after_stat(stratum)), 
            stat = "stratum", 
            reverse = TRUE, 
            size = rel(1.5)) + # plot the text
  theme_bw() + # black and white theme
  scale_x_continuous(expand=c(0,0)) +
  scale_y_continuous(expand=c(0,0))

# Build plot for reverse engineering of polygon coordinates
pbuilt <- ggplot_build(p)

data_draw <- transform(pbuilt$data[[1]], width = 1/3)

draw_by_group <- function(dat) {
  first_row <- dat[1, setdiff(names(dat),
                              c("x", "xmin", "xmax",
                                "width", "knot.pos",
                                "y", "ymin", "ymax")),
                   drop = FALSE]
  rownames(first_row) <- NULL

  curve_data <- ggalluvial:::data_to_xspline(dat, knot.prop = TRUE)
  data.frame(first_row, curve_data)
}

# Get spline coordinates from each polygon in the built plot
groups_to_draw <- lapply(1:10, function(i) data_draw[data_draw$group == i,])
polygon_coords <- lapply(groups_to_draw, draw_by_group)

xsplines <- map(polygon_coords, ~ xsplineGrob(x=.$x, y=.$y, shape=.$shape, open=TRUE))

# Use grid::xsplinePoints to draw the curve for each polygon
xxpts <- map(xsplines, xsplinePoints)

# Combine into a data frame for diagnostic plotting
xxptsdf <- imap_dfr(xxpts, function(pts, id) data.frame(id=id, x=as.numeric(pts$x), y=as.numeric(pts$y)))

# We have to now figure out the conversion factor to get the grid graphics back to the actual graphics.
node_width<- 1/8
yrange <- c(0, nrow(example_data)) # Number of axes in example data
xrange <- c(1 - node_width/2, 3 + node_width/2) # From 1 to number of strata, adjusted by half the node width

# Function to convert grid graphics coordinates to data coordinates
new_range_transform <- function(x_old, min_x_new, max_x_new) {
  (x_old - min(x_old))/(max(x_old) - min(x_old)) * (max_x_new - min_x_new) + min_x_new
}

xxptsdf$x_plotcoords <- new_range_transform(xxptsdf$x, xrange[1], xrange[2])
xxptsdf$y_plotcoords <- new_range_transform(xxptsdf$y, yrange[1], yrange[2])

p + geom_polygon(data = xxptsdf %>% mutate(group=factor(id)), aes(x=x_plotcoords,y=y_plotcoords,group=group,fill=group),color='red',inherit.aes=F,alpha=0.4)
corybrunson commented 3 years ago

@qdread this looks very good as a testing ground. I'll take a close look later this week and hopefully have some insight to share.

qdread commented 3 years ago

I looked at it a bit more and maybe the only thing that needs to be changed is the xrange is a little off. If you mess with the extra padding on the min and max, it gets it closer. But I just didn't know the way to calculate the padding exactly.

qdread commented 3 years ago

Sorry for spamming ... I figured it out. Node width in geom_alluvium defaults to 1/3. The width I was using was 1/8 which is actually the width of the geom_stratum, not the width of the flat portion between the curves. So that caused things not to match up. If you just use node_width = 1/3 above, you get a nice match.

One of these days I'll work on the vignette ๐Ÿ˜€

corybrunson commented 3 years ago

Excellent! Let me know again if i might be helpful in future.

qdread commented 3 years ago

Hi again, I've written up a draft of the vignette! I need to polish it a little bit but once I do, I will make a PR, hopefully I will get that done tomorrow or sometime next week. You might need to look over it to make sure I used all the correct ggalluvial terminology, if you don't mind.

corybrunson commented 3 years ago

Of course, not a problem. I look forward to seeing it!

corybrunson commented 3 years ago

I've made some subtle changes and updated the website (your vignette is here). I couldn't get the images to center both in the vignette when rendered locally and in the pkgdown website, so i'd be glad to know if you figure out a way! Anyway, i'll submit the next release to CRAN tonight or tomorrow, unless you see something amiss there.

corybrunson commented 3 years ago

This seems to have been resolved, and remaining issues with Shiny integration will play out in new issues and pull requests.