corybrunson / ggalluvial

ggplot2 extension for alluvial plots
http://corybrunson.github.io/ggalluvial/
GNU General Public License v3.0
497 stars 34 forks source link

Seek for a workaround with MAXNUMPTS limits issue #22

Closed esovetkin closed 6 years ago

esovetkin commented 6 years ago

Description of the issue

I have a dataset with ~5 year daily data for which I want to make an alluvial plot.

I hit some limits and receive an error:

Error in grid.Call.graphics(C_xspline, x$x, x$y, x$shape, x$open, x$arrow,  : 
  add_point - reached MAXNUMPTS (25200) 

I wonder if there a workaround.

Reproducible example (preferably using reprex::reprex())

Here is a reproducible example:

z <- data.frame("Date"=rep(seq(as.Date("2013-01-01"),as.Date("2019-01-01"),1),3),                                                                                                           
                             "value"=rexp(2192*3),                                                                                                                                                       
                             "Category"=sort(rep(1:3,2192)))

ggplot(data = z,                                                                                                                                                                            
              aes(x = Date, y = value, alluvium = Category)) +                                                                                                                                
    geom_alluvium(aes(fill = Category, colour = Category),                                                                                                                                    
                  alpha = .75, decreasing = FALSE) 

Any suggestion is appreciated.

corybrunson commented 6 years ago

I'm able to reproduce the error. It occurs within the grid package, which i'm not intimately familiar with, so i don't think i can help fix it.

The problem is with the number of ordered pairs used by geom_alluvium() to construct the spline, so a workaround is to instead combine geom_flow() and geom_lode(), which both use only four or eight points per spline. (See the examples at ?geom_flow for ways to control the colors.) The trade-off is that a lot of splines get rendered, which may slow down the plot:

ggplot(data = z, size = .2,
       aes(x = Date, y = value, alluvium = Category)) +
  geom_flow(stat = "alluvium", alpha = .75, decreasing = FALSE) +
  geom_lode(stat = "alluvium", alpha = .75, decreasing = FALSE)

Though did you intend for the horizontal axis (the x aesthetic) to be so precise? Unless the diagram is to be rendered on a much larger window than, e.g., the RStudio "Plots" panel, it would probably be more legible after aggregating value, for example by year:

library(dplyr)
library(lubridate)
z2 <- summarise(
  group_by(
    mutate(z, Date = year(Date)),
    Date, Category
  ),
  value = sum(value)
)
ggplot(data = z2,
       aes(x = Date, y = value, alluvium = Category)) +
  geom_alluvium(aes(fill = Category, colour = Category),
                alpha = .75, decreasing = FALSE)
esovetkin commented 6 years ago

Cheers! Thanks! That solves the issue!

And thanks for the simplification tip.

corybrunson commented 6 years ago

Welcome! Thanks for raising the issue. I'm glad to know that it can be worked around.