bgreenwell / pdp

A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.
http://bgreenwell.github.io/pdp
91 stars 12 forks source link

🐛 in autoplot with factor followed by numeric #79

Closed bgreenwell closed 4 years ago

bgreenwell commented 6 years ago
# Load required packages
library(ggplot2)
library(pdp)
library(randomForest)

# Load Boston housing data
data(boston)

# Fit a random forest model
set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)

# Two predictor PDP (factor/numeric)
boston.rf %>%
  partial(pred.var = c("chas", "lstat"), chull = TRUE) %>%
  autoplot(contour = TRUE, main = "factor/numeric")

image

bgreenwell commented 6 years ago

FYI numeric followed by factor seems to work fine! 🤔

bfgray3 commented 6 years ago
library(ggplot2)
library(pdp)
suppressPackageStartupMessages(library(randomForest))

data(boston)

set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)

p1 <- partial(boston.rf, pred.var = c("chas", "lstat"), chull = TRUE)
p2 <- partial(boston.rf, pred.var = c("lstat", "chas"), chull = TRUE)

dplyr::all_equal(p1, p2)
#> [1] TRUE

# autoplot.R -> autoplot.partial -> ggplot_two_predictor_pdp
# line 373 if block broken, line 402 if block good

ggplot(p1, aes(x = p1[[2L]], y = p1[["yhat"]])) + geom_line() + facet_wrap(~ p1[[1L]])

ggplot(p2, aes(x = p2[[1L]], y = p2[["yhat"]])) + geom_line() + facet_wrap(~ p2[[2L]])


ggplot(p1, aes(x = lstat, y = yhat)) + geom_line() + facet_wrap(~ chas)

ggplot(p2, aes(x = lstat, y = yhat)) + geom_line() + facet_wrap(~ chas)


ggplot(p1, aes_string(x = "lstat", y = "yhat")) + geom_line() + facet_wrap("chas")

ggplot(p2, aes_string(x = "lstat", y = "yhat")) + geom_line() + facet_wrap("chas")

Created on 2018-08-30 by the reprex package (v0.2.0).

Somehow ggplot is getting confused by the subsetting with [[. I believe this should be fixable with aes_string after picking out the column names instead of using integer indices (I know you are wary of ggplot2 3.0.0 and tidyeval for use in packages). I could make a PR in the next week or so if it would be helpful.

bgreenwell commented 6 years ago

@bfgray3 thanks for the thorough reprex!! I've been toying with ggplot2 using [[ outside of pdp but cannot reproduce the error, so I'm not sure where the bug truly lies, but I suspect ggplot2 (since plotPartial()/lattice works just fine in this example). If that's the case, I'd submit an issue there. For now, maybe we can use aes_string() only for the factor/numeric case? I'm also not opposed to using tidyeval, just haven't had the time to learn it 😔. Happy to take any PR with a fix, even if temporary!

bfgray3 commented 6 years ago

I agree the issue is likely with ggplot.

library(ggplot2)
library(pdp)
suppressPackageStartupMessages(library(randomForest))

data(boston)

set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)

p1 <- partial(boston.rf, pred.var = c("chas", "lstat"), chull = TRUE)
class(p1)
#> [1] "data.frame" "partial"
class(p1) <- "data.frame"
str(p1)
#> 'data.frame':    102 obs. of  3 variables:
#>  $ chas : Factor w/ 2 levels "0","1": 1 2 1 2 1 2 1 2 1 2 ...
#>  $ lstat: num  1.73 1.73 2.45 2.45 3.18 ...
#>  $ yhat : num  31.1 31.5 31.1 31.5 31.1 ...
ggplot(p1, aes(x = p1[[2L]], y = p1[["yhat"]])) + geom_line() + facet_wrap(~ p1[[1L]])

Created on 2018-08-31 by the reprex package (v0.2.0).

In the meantime I'll see what I can do with aes_string.

bfgray3 commented 6 years ago

These plots look good though. This is a real noodle scratcher.

library(ggplot2)

data(iris)

dat1 <- iris[c("Species", "Sepal.Length", "Sepal.Width")]
dat2 <- iris[c("Sepal.Length", "Species", "Sepal.Width")]

ggplot(dat1, aes(x = dat1[[2L]], y = dat1[["Sepal.Width"]])) + geom_line() + facet_wrap(~ dat1[[1L]])

ggplot(dat2, aes(x = dat2[[1L]], y = dat2[["Sepal.Width"]])) + geom_line() + facet_wrap(~ dat2[[2L]])

Created on 2018-08-31 by the reprex package (v0.2.0).