Closed AbdollahiAz closed 7 months ago
Hi.
could not find function "xgboost"
means that you most likely have not installed xgboost
. Have you installed it? If not, then do so.
I am surprised by the low R2 value for ranger
compared to lm
, but what is your justification for using R2 compared to, e.g., MSE? Have you made a mistake in Case 1? You call your model model
, but when you compute the R2, you use model_lm
. Is that intentional?
I would argue that the overall trends in your figures resemble each other. However, you have to remember that you are explaining two different models, so obtaining identical explanations would be strange.
Lars
Dear @LHBO,
Thanks for your explanation. I address the typo and installed xgboost. It worked.
Sincerely, Az
Dear shapr,
Inspired from issue #385 , I used the
airquality
dataset to implementvaeac
approach where the feature “month” is considered a categorical feature usingas.factor(Month)
(Is it correct?). I fitted 3 machine learning (ML) models includingxgboost
,lm
andranger
. For example, forxgboost
, I faced the following error:I calculate the R2 error metric as follows: Lm: R2=0.6481237 Ranger: R2=0.1001726 As per attachment, you can see the calculated shap values are completely different values according to beeswarm plots.
My questions for mixed dataset are as follows: 1) Which machine learning algorithm should be used? 2) Which configuration should be applied? (Case1: ranger + vaeac + categorical; Case 2: lm+ vaeac +categorical; Case 3: xgboost+ vaeac +categorical)
Based on the suggestion of Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features paper, I expect to
ranger
work well. However, I am a little bit confused how I can trust the SHAP values when the fitted model is biased. Please help @LHBO. Please find the code in the following:Sincerely, Azam