Closed sofalbre closed 1 year ago
I am trying now DALEX package and it handles very well my variable names and I created nice partial dependence plots. I couldnt compute probability plots yet though.
Hi @sofalbre thanks for reaching out. pdp (as well as DALEX and iml) can construct PDPs for ANY model in R, and on any scale (e.g., probabilities). In your case, you likely need to use a suitable prediction wrapper. If you could post a small reproducible example with rpartscore (maybe using some built-in R data) I'd be happy to post a solution for you.
Dear @bgreenwell thank you very much for getting back to me! I have attached a csv with example data and a simplified rscript. Good luck and thanks again for your help! subsample.csv simplescript.txt
@sofalbre unfortunately, at least as far as I can tell from the docs, you cannot get predicted probabilities from rpartScore, so this is a limitation of the modeling package. Let me know if I'm wrong here though.
Dear @bgreenwell. Thanks again for checking! That's fine so, at least I can spare my time looking for it. ;)
Hello, I am trying to compute partial dependence plots for my rpartscore model, have tried different things, but cant fix it so far.
after splitting data in training and testing, my tree model is (unfortunately the ``signs are specifying code here, but are also embedded in my code to specify names of the variable with a mathematical sign in it for the use in R, I hope this wont be confusion throughout the question here, as it turns code on and off, I have added double signs in the code, they are single in my code though) :
tree <- rpartScore(Nutritional.Status.olr ~ VBT+``VBT/L``+``d/r``+SMI+``Residual M/L``+BMI+``M/L``+``G/L``+LMD, data = datatrain)
which works well in predicting the testing dataset, did a confusion matrix afterwards, etc.Anyway, I am generating now partial dependence plots with this line of code of the dpd package:
partial(big.tree, pred.var = "VBT",prob=T, plot = T, type = "regression", smooth=TRUE)
and i get the following image: dpdUnfortunately, I would like the probabilities though, so not the actual predicted value, but how much the variable influences the model at that point, like described e.g. here: "Single variables shows how there value affect the model, on y-axis having a negative value means for that particular value of predictor variable it is less likely to predict the correct class on that observation and having a positive value means it has positive impact on predicting the correct class. Same applies to two variable plots, color represent the intensity of affect on model." https://rpubs.com/vishal1310/QuickIntroductiontoPartialDependencePlots
If i change the line to
partial(big.tree, pred.var = "VBT",prob=F, plot = T, type = "regression", smooth=TRUE)
nothing changes, I get the same plot.I have also tried now
pred.prob <- function(object, newdata) { pred <- predict(object, newdata, probability = TRUE) prob.setosa <- attr(pred, which = "probabilities")[, "1"] mean(prob.setosa) }
vbt<- partial(big.tree, pred.var = "VBT", plot = TRUE, pred.fun = pred.prob, type = "regression")
vbt
which resulted in the plotbut not in my x scale showing the VBT values and Y the probability.. Is there any way I can fix this?
Additionally, I can not handle the other variable names again... I have tried "" and `` but the function doesnt accept them... any ideas? I would not like to go back to the beginning of the analysis and rename everything, as I wont have the correct tree variable names then...
result_VBT <- partial(big.tree, pred.var = "VBT", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_VBT_L <- partial(big.tree, pred.var = "VBT/L", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_d_r <- partial(big.tree, pred.var = ``d/r``, prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_SMI <- partial(big.tree, pred.var = "SMI", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_Residual_M_L <- partial(big.tree, pred.var = "Residual M/L", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_BMI <- partial(big.tree, pred.var = "BMI", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_M_L <- partial(big.tree, pred.var = "M/L", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_G_L <- partial(big.tree, pred.var = "G/L", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_LMD <- partial(big.tree, pred.var = "LMD", prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE) result_VBT_L <- partial(big.tree, pred.var = ``VBT/L``, prob = TRUE, plot = TRUE, type = "regression", smooth = TRUE)
here is an example of my data, the data is now renamed to the variables used above in the model:
`
Thank you very much for any input!