Closed hbaniecki closed 3 years ago
I've could not find a reproducible example, @hbaniecki would you check if this is solved?
I've checked this with
library("DALEX")
library("ingredients")
library("randomForest")
model_titanic_glm <- randomForest(survived ~ gender + age + fare,
data = na.omit(titanic_imputed))
titanic_imputed[2:1000,2] = NA
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
pdp_glm <- partial_dependence(explain_titanic_glm,
N = 25, variables = c("age", "fare","sibsp"),
variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
plot(pdp_glm)
I guess that after the fix it works
library("DALEX")
library("ingredients")
library("randomForest")
model_titanic_glm <- randomForest(survived ~ gender + age + fare,
data = na.omit(titanic_imputed))
titanic_imputed[2:1000,2] = NA
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
pdp_glm <- partial_dependence(explain_titanic_glm,
N = 25, variables = c("age", "fare","sibsp"))
#, variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
plot(pdp_glm)
thanks
Hi there,
I'm wondering if there is some way of making conditional and accumulated dependence plots work with NAs? i,e,
library("DALEX")
library("ingredients")
library("randomForest")
model_titanic_glm <- randomForest(survived ~ gender + age + fare,
data = na.omit(titanic_imputed))
titanic_imputed[2:1000,2] = NA
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
pdp_glm <- conditional_dependence(explain_titanic_glm,
N = 25, variables = c("age", "fare","sibsp"))
#, variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
plot(pdp_glm)
Thanks
Hi, what is your goal? PD/ALE rely on estimating expected predictions with respect to data distribution.
Did you consider removing observations without age
(with NAs) from data
to estimate the explanation of age
?
Sorry, this was a bad example. I was piggybacking on the example from this thread. In doing more testing with reasonable numbers of NAs, I see that conditional_dependence()
does work with NAs:
library("DALEX")
library("ingredients")
library("randomForest")
model_titanic_glm <- randomForest(survived ~ gender + age + fare,
data = na.omit(titanic_imputed))
toNA<-sample(1:1000,10)
titanic_imputed[toNA,] = NA
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
pdp_glm <- conditional_dependence(explain_titanic_glm,
N = 25, variables = c("age", "fare","sibsp"))
#, variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
plot(pdp_glm)
Unfortunately, in my significantly larger and more complicated models, I'm running into issues related to missing values where the aggregated profiles aren't being calculated. When I impute the missing values, there are no issues. But I can't seem to recreate it with a simpler dataset/model. Do you know of any situations where aggregating profiles fails elated to NAs? There are no instances where an entire column is NAs like in my previous examples.
crossref https://github.com/ModelOriented/modelStudio/issues/71