daattali / ggExtra

📊 Add marginal histograms to ggplot2, and more ggplot2 enhancements
http://daattali.com/shiny/ggExtra-ggMarginal-demo/
Other
383 stars 45 forks source link

Lift restriction to scatter plots, allow marginal plots for density plots #132

Closed jarauh closed 6 years ago

jarauh commented 6 years ago

In earlier versions of ggExtra, it was possible to add a ggMarginal to a geom_line. The current code explicitly looks for a geom_?point in the plot and fails if no scatter plot is present.

When there are lots of data points, I find that scatter points are not very helpful, and I prefer to simply plot the 2d-density. A marginal plots can still be helpful in such a situation.

Example (adapted from the geom_density_2d-documentation):

m <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
     geom_density_2d() +
     xlim(0.5, 6) +
     ylim(40, 110)
ggMarginal(m)

Suggestion:

  1. Use the new code to determine if the plot p contains a scatter plot.
  2. If not, use the old heuristics (which looks for a layer with data that has variables x and y) as a fallback.
crew102 commented 6 years ago

Hi @jarauh ,

We probably don't want to try and support other plot types (i.e., non-scatter plots) for the main plot (see https://github.com/daattali/ggExtra/issues/131#issuecomment-429573557 for further details). It would also be difficult to support 2-d density plots b/c of the issue shown below:

library(ggplot2)

p <- ggplot(faithful, aes(eruptions, waiting))

# ggMarginal pulls the scatter plot data out of the built ggplot2 object and 
# creates marginal plots from it.
pb <- ggplot_build(p + geom_point())
head(pb$data[[1]])
#>       x  y PANEL group shape colour size fill alpha stroke
#> 1 3.600 79     1    -1    19  black  1.5   NA    NA    0.5
#> 2 1.800 54     1    -1    19  black  1.5   NA    NA    0.5
#> 3 3.333 74     1    -1    19  black  1.5   NA    NA    0.5
#> 4 2.283 62     1    -1    19  black  1.5   NA    NA    0.5
#> 5 4.533 85     1    -1    19  black  1.5   NA    NA    0.5
#> 6 2.883 55     1    -1    19  black  1.5   NA    NA    0.5

# however, the data in the built 2-d density plot is not the original data, but
# rather the data that is describing the density curves.
p_density <- p + geom_density2d()
p_density


pb <- ggplot_build(p_density)
head(pb$data[[1]])
#>   level        x        y piece  group PANEL  colour size linetype alpha
#> 1 0.002 1.600000 67.31097     1 -1-001     1 #3366FF  0.5        1    NA
#> 2 0.002 1.635354 67.60062     1 -1-001     1 #3366FF  0.5        1    NA
#> 3 0.002 1.638725 67.62626     1 -1-001     1 #3366FF  0.5        1    NA
#> 4 0.002 1.670707 67.87765     1 -1-001     1 #3366FF  0.5        1    NA
#> 5 0.002 1.706061 68.12348     1 -1-001     1 #3366FF  0.5        1    NA
#> 6 0.002 1.712013 68.16162     1 -1-001     1 #3366FF  0.5        1    NA

Created on 2018-10-16 by the reprex package (v0.2.0.9000).

Session info ``` r devtools::session_info() #> Session info ------------------------------------------------------------- #> setting value #> version R version 3.4.4 (2018-03-15) #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> tz America/New_York #> date 2018-10-16 #> Packages ----------------------------------------------------------------- #> package * version date source #> assertthat 0.2.0 2017-04-11 cran (@0.2.0) #> backports 1.1.2 2017-12-13 cran (@1.1.2) #> base * 3.4.4 2018-03-16 local #> bindr 0.1.1 2018-03-13 cran (@0.1.1) #> bindrcpp 0.2.2 2018-03-29 cran (@0.2.2) #> colorspace 1.3-2 2016-12-14 cran (@1.3-2) #> compiler 3.4.4 2018-03-16 local #> curl 3.2 2018-03-28 CRAN (R 3.4.4) #> datasets * 3.4.4 2018-03-16 local #> devtools 1.13.6 2018-06-27 cran (@1.13.6) #> digest 0.6.17 2018-09-12 cran (@0.6.17) #> dplyr 0.7.5 2018-05-19 cran (@0.7.5) #> evaluate 0.10.1 2017-06-24 cran (@0.10.1) #> ggplot2 * 3.0.0 2018-07-03 CRAN (R 3.4.4) #> glue 1.2.0 2017-10-29 cran (@1.2.0) #> graphics * 3.4.4 2018-03-16 local #> grDevices * 3.4.4 2018-03-16 local #> grid 3.4.4 2018-03-16 local #> gtable 0.2.0 2016-02-26 cran (@0.2.0) #> htmltools 0.3.6 2017-04-28 cran (@0.3.6) #> httr 1.3.1 2017-08-20 CRAN (R 3.4.4) #> knitr 1.20 2018-02-20 cran (@1.20) #> labeling 0.3 2014-08-23 cran (@0.3) #> lazyeval 0.2.1 2017-10-29 cran (@0.2.1) #> magrittr 1.5 2014-11-22 cran (@1.5) #> MASS 7.3-50 2018-04-30 CRAN (R 3.4.4) #> memoise 1.1.0 2017-04-21 CRAN (R 3.4.4) #> methods * 3.4.4 2018-03-16 local #> mime 0.5 2016-07-07 CRAN (R 3.3.2) #> munsell 0.5.0 2018-06-12 cran (@0.5.0) #> pillar 1.2.3 2018-05-25 cran (@1.2.3) #> pkgconfig 2.0.1 2017-03-21 cran (@2.0.1) #> plyr 1.8.4 2016-06-08 cran (@1.8.4) #> purrr 0.2.5 2018-05-29 cran (@0.2.5) #> R6 2.2.2 2017-06-17 cran (@2.2.2) #> Rcpp 0.12.18 2018-07-23 cran (@0.12.18) #> rlang 0.2.2 2018-08-16 cran (@0.2.2) #> rmarkdown 1.10 2018-06-11 cran (@1.10) #> rprojroot 1.3-2 2018-01-03 cran (@1.3-2) #> scales 1.0.0 2018-08-09 cran (@1.0.0) #> stats * 3.4.4 2018-03-16 local #> stringi 1.2.4 2018-07-20 cran (@1.2.4) #> stringr 1.3.1 2018-05-10 cran (@1.3.1) #> tibble 1.4.2 2018-01-22 cran (@1.4.2) #> tidyselect 0.2.4 2018-02-26 cran (@0.2.4) #> tools 3.4.4 2018-03-16 local #> utils * 3.4.4 2018-03-16 local #> withr 2.1.2 2018-03-15 CRAN (R 3.4.4) #> xml2 1.2.0 2018-01-24 cran (@1.2.0) #> yaml 2.2.0 2018-07-25 cran (@2.2.0) ```

This may not always have been the case for ggplot2 (i'm using ggplot2 v 3.0.0), and I assume it wasn't if you were indeed getting marginal plots from ggMarginal() that made sense with your 2-d density plots. Regardless, I don't think it makes sense to support this kind of feature to begin with. I'd suggest installing whatever version of ggExtra was working with 2d density plots for you in the past and going from there.

jarauh commented 6 years ago

Ah, I understand. Thank you for the explanation.

So the problem seems to be: The original data disappears when using ggplot_build. However, before ggplot_build, it is difficult to anticipate all transformations that are hidden in the ggplot object, and so ggMarginal has to use ggplot_build. Is this more or less correct?

By the way, is there a good place to learn about the ggplot2-internals? Apart from reading the sources...

crew102 commented 6 years ago

So the problem seems to be: The original data disappears when using ggplot_build.

Kinda...You can still get the raw data using ggplot_built, but it's just not located where it normally is (i.e., when the plot is a scatter plot). The bigger issue really has to do with the fact that supporting marginal plots for plots that aren't scatter plots would be extremely messy and is beyond the scope of the package.

However, before ggplot_build, it is difficult to anticipate all transformations that are hidden in the ggplot object, and so ggMarginal has to use ggplot_build. Is this more or less correct?

Yep.

By the way, is there a good place to learn about the ggplot2-internals? Apart from reading the sources...

You can try the ggplot2 book: https://github.com/hadley/ggplot2-book

jarauh commented 6 years ago

Thanks for your answers. I have the book, but it does not go into the details. For example, ggplot_build is not mentioned, not even ggproto. The ggplot homepage has some more information (on ggproto), but again, not too much detail about ggplot_build and other internals. But ok, maybe the documentation is not fast enough to cover recent changes to internals (given the recent jump in major version).

In any case, further pointers to ggplot insight would be appreciated, if available. I will close the issue now.

crew102 commented 6 years ago

I would look to the source if you can't find details in the book.