Closed bgreenwell closed 4 years ago
FYI numeric followed by factor seems to work fine! 🤔
library(ggplot2)
library(pdp)
suppressPackageStartupMessages(library(randomForest))
data(boston)
set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)
p1 <- partial(boston.rf, pred.var = c("chas", "lstat"), chull = TRUE)
p2 <- partial(boston.rf, pred.var = c("lstat", "chas"), chull = TRUE)
dplyr::all_equal(p1, p2)
#> [1] TRUE
# autoplot.R -> autoplot.partial -> ggplot_two_predictor_pdp
# line 373 if block broken, line 402 if block good
ggplot(p1, aes(x = p1[[2L]], y = p1[["yhat"]])) + geom_line() + facet_wrap(~ p1[[1L]])
ggplot(p2, aes(x = p2[[1L]], y = p2[["yhat"]])) + geom_line() + facet_wrap(~ p2[[2L]])
ggplot(p1, aes(x = lstat, y = yhat)) + geom_line() + facet_wrap(~ chas)
ggplot(p2, aes(x = lstat, y = yhat)) + geom_line() + facet_wrap(~ chas)
ggplot(p1, aes_string(x = "lstat", y = "yhat")) + geom_line() + facet_wrap("chas")
ggplot(p2, aes_string(x = "lstat", y = "yhat")) + geom_line() + facet_wrap("chas")
Created on 2018-08-30 by the reprex package (v0.2.0).
Somehow ggplot is getting confused by the subsetting with [[
. I believe this should be fixable with aes_string
after picking out the column names instead of using integer indices (I know you are wary of ggplot2 3.0.0 and tidyeval for use in packages). I could make a PR in the next week or so if it would be helpful.
@bfgray3 thanks for the thorough reprex!! I've been toying with ggplot2 using [[
outside of pdp but cannot reproduce the error, so I'm not sure where the bug truly lies, but I suspect ggplot2 (since plotPartial()
/lattice works just fine in this example). If that's the case, I'd submit an issue there. For now, maybe we can use aes_string()
only for the factor/numeric case? I'm also not opposed to using tidyeval, just haven't had the time to learn it 😔. Happy to take any PR with a fix, even if temporary!
I agree the issue is likely with ggplot.
library(ggplot2)
library(pdp)
suppressPackageStartupMessages(library(randomForest))
data(boston)
set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)
p1 <- partial(boston.rf, pred.var = c("chas", "lstat"), chull = TRUE)
class(p1)
#> [1] "data.frame" "partial"
class(p1) <- "data.frame"
str(p1)
#> 'data.frame': 102 obs. of 3 variables:
#> $ chas : Factor w/ 2 levels "0","1": 1 2 1 2 1 2 1 2 1 2 ...
#> $ lstat: num 1.73 1.73 2.45 2.45 3.18 ...
#> $ yhat : num 31.1 31.5 31.1 31.5 31.1 ...
ggplot(p1, aes(x = p1[[2L]], y = p1[["yhat"]])) + geom_line() + facet_wrap(~ p1[[1L]])
Created on 2018-08-31 by the reprex package (v0.2.0).
In the meantime I'll see what I can do with aes_string
.
These plots look good though. This is a real noodle scratcher.
library(ggplot2)
data(iris)
dat1 <- iris[c("Species", "Sepal.Length", "Sepal.Width")]
dat2 <- iris[c("Sepal.Length", "Species", "Sepal.Width")]
ggplot(dat1, aes(x = dat1[[2L]], y = dat1[["Sepal.Width"]])) + geom_line() + facet_wrap(~ dat1[[1L]])
ggplot(dat2, aes(x = dat2[[1L]], y = dat2[["Sepal.Width"]])) + geom_line() + facet_wrap(~ dat2[[2L]])
Created on 2018-08-31 by the reprex package (v0.2.0).