Closed mula-jpg closed 7 months ago
Hello Mulatu,
The importance of variable calculated through the BIOMOD_Modeling and bm_VariablesImportance functions is computed through the same method used within the randomForest package :
For each variable to be evaluated :
The highest the value, the less reference and shuffled predictions are correlated, and the more influence the variable has on the model. A value of 0 assumes no influence of the variable on the model.
Note that this calculation does not account for variables' interactions.
Hence, the score does not represent how much the variable explains the variability contained within the data, but rather the impact it has on predictions. And scores do not sum to 1 between variables. But it is comparable between algorithms.
So it means that in your case, bio9
, if shuffled, causes a change in your predictions that only relates to original predictions by 20% of correlation.
Is this clearer ? :eyes:
Maya
Dear Maya, Thank you your help. But I am not understand well. 1) How do I know whether it Shuffled or not? Even the word Shuffle. 2) Is my work acceptable?? 3) Please make more clarify the concept: So it means that in your case, bio9, if shuffled, causes a change in your predictions that only relates to original predictions by 20% of correlation. Best Mulatu
Dear Mulatu,
shuffle means that all the values of one variable are randomly mixed between sites.
For example, if you have a model with 3 variables : temperature T
, precipitations P
and slope S
.
GLM
and RF
var.import = 3
What it is going to do, within the BIOMOD_Modeling function :
T
:
T
P
and S
as usual : REFT
a first timeT_shuffled1
P
and S
: _PREDshuffled1T
a second timeT_shuffled2
P
and S
: _PREDshuffled2var.import
P
and S
Hope it is clearer that way,
Maya
Dear Maya, Thank you very much. I wonder your effort. I understand the case now. but how I can cite such explanation in my manuscript. Do have any reference? Or I can say Personal communication? best Mulatu
Hi there, This is a common way to extract variable importance and as a Maya explained, this is what is also used in RandomForest, and other algorithms like SHAP or others. Just write: To extract variable importance, original predictions are compared to predictions where a single variable at a time was reshuffled. Comparisons are made with Pearson' correlation coefficient and standardized between variables (Thuiller et al. 2009 Ecography) Best Wilfried
Dear Biomod2 developers and experts, I observed exaggerated importance score of a single variable (Bio9) relative to others. It accounts more than 80%. Is it acceptable?? If not, pls show me any solution.
Mymodels_var_import <- get_variables_importance(Mymodel)