daattali / ggExtra

📊 Add marginal histograms to ggplot2, and more ggplot2 enhancements
http://daattali.com/shiny/ggExtra-ggMarginal-demo/
Other
383 stars 48 forks source link

ggMarginal does not apply scale transforms from scatter plot to marginal plots #81

Closed crew102 closed 6 years ago

crew102 commented 7 years ago

ggmarginal isn't applying scale transforms (e.g., limits, scale_reverse, scale_log, etc.) to the scales of the marginal histograms.

Example: When points are excluded from the scatter plot due to limits on the plot's range, the marginal plots do not reflect the in-range data. This occurs only for the opposite margin for which the limits are set (e.g., if you set the limits of the x-axis, the y marginal plot will not be correct.

library(devtools)
library(withr)

with_temp_libpaths({

  # Get ggExtra version pre-refactor
  install_github("daattali/ggExtra", ref = "863b870")

  # Get ggplot2 2.2.0 so code runs
  install_version("ggplot2", "2.2.0")

  library(ggplot2)
  library(ggExtra)

  p <- ggplot(data = mtcars) +
    geom_point(aes(wt, mpg)) 

  marg_p_x <- ggMarginal(p = p + xlim(c(0, 2)))

  marg_p_y <- ggMarginal(p = p + ylim(c(25, 35)))

})

marg_p_x gives us a y density plot that appears to reflect the entirety of the data, as opposed to just the four points in range.

marg_p_x

marg_p_y has a similar problem, except it occurs for the x marginal:

marg_p_y

daattali commented 7 years ago

Interesting. I wonder how long this has been the case (always?)

I'm really glad the testing framework works well now on Travis so that whenever problems are fixed they won't pop up again

crew102 commented 7 years ago

Tough to say for how long this has been happening. I assume it's hard to tell that anything is going on for most cases (i.e., when limits do not drastically change the set of rendered points)...For sure it will be nice to rely on tests moving forward, esp. given potential changes in ggplot2 internals and my tendency to refactor in a regression or two!

crew102 commented 6 years ago

I'm close to having a solution for this, but I've got one issue that I can't seem to figure out. Hoping you can help. The issue is related to the one discussed at https://github.com/tidyverse/ggplot2/issues/1651 and https://stackoverflow.com/questions/37876096/geom-histogram-wrong-bins, but I'm still not seeing a solution.

In short, the problem is that the right-most bin present in marginal plot histograms are getting excluded from the marginal plots when I add a range limit. For example, let's say I have the following plot:

library(ggplot2)

min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)

p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(boundary = min_wt)
p

rplot

Now when I add xlim(), I lose the bin on the far right even though that bin should still be within the plot limits:

p + xlim(c(min_wt, max_wt))

rplot2

Here's where I think the problem is coming from:

# The value of the right edge of the right-most bin, according to ggplot:
xmax_max <- max(ggplot_build(p)$data[[1]]$xmax) 
xmax_max # 5.4240000

# The max value of the data:
max_wt # 5.24

# The issue:
xmax_max > max_wt # TRUE, so we lose that last bin

I could go with the workaround solution suggested in that SO answer (e.g., multiplying max_wt by something like 1 + 100000000000 * .Machine$double.eps) but I'd prefer not to. Any ideas?

daattali commented 6 years ago

I've looked into this a lot this week and haven't gotten anywhere further than you have. Is this still a known issue in ggplot? Because the issue on their repo was closed with a commit in Sept 2016, so do you think they don't know about this issue?

crew102 commented 6 years ago

Hey Dean, I'm not sure whether this is an issue with ggplot or whether it's my lack of understanding of geom_histogram(). I've posted a question on SO to hopefully get to the bottom of it https://stackoverflow.com/questions/49204576/values-getting-dropped-from-ggplot2-histogram-when-specifying-limits

daattali commented 6 years ago

fixed by @crew102