biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
86 stars 22 forks source link

Exaggerated one variable relative to others #383

Closed mula-jpg closed 7 months ago

mula-jpg commented 10 months ago

Dear Biomod2 developers and experts, I observed exaggerated importance score of a single variable (Bio9) relative to others. It accounts more than 80%. Is it acceptable?? If not, pls show me any solution.

Mymodels_var_import <- get_variables_importance(Mymodel)

Mymodels_var_import full.name PA run algo expl.var rand var.imp 1 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio3 1 0.206947 2 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio4 1 0.211810 3 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio7 1 0.181641 4 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio9 1 0.830054 5 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio13 1 0.064827 6 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio14 1 0.146145 7 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio15 1 0.069552 8 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio18 1 0.166369 9 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio19 1 0.049224 10 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM Veget 1 0.149824 11 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM lul 1 0.198635 12 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio3 2 0.213585 13 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio4 2 0.223937 14 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio7 2 0.184157 15 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio9 2 0.851860 16 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio13 2 0.058928 17 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio14 2 0.145478 18 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio15 2 0.065082 19 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio18 2 0.170324 20 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio19 2 0.048497 21 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM Veget 2 0.135650 22 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM lul 2 0.186059 23 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio3 3 0.205876 24 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio4 3 0.216501 25 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio7 3 0.180713 26 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio9 3 0.839437 27 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio13 3 0.058305 28 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio14 3 0.137381 29 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio15 3 0.064011 30 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio18 3 0.176404 31 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM bio19 3 0.053411 32 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM Veget 3 0.148916 33 myRespName_PA1_RUN1_GLM PA1 RUN1 GLM lul 3 0.181574 34 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio3 1 0.004477 35 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio4 1 0.002017 36 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio7 1 0.024628 37 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio9 1 0.863867 38 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio13 1 0.027760 39 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio14 1 0.007495 40 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio15 1 0.014673 41 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio18 1 0.001385 42 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio19 1 0.086212 43 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM Veget 1 0.024784 44 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM lul 1 0.027138 45 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio3 2 0.004570 46 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio4 2 0.002145 47 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio7 2 0.027517 48 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio9 2 0.871010 49 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio13 2 0.024895 50 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio14 2 0.007486 51 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio15 2 0.014468 52 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio18 2 0.001239 53 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio19 2 0.087832 54 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM Veget 2 0.021944 55 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM lul 2 0.028114 56 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio3 3 0.004542 57 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio4 3 0.002298 58 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio7 3 0.025042 59 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio9 3 0.863170 60 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio13 3 0.027712 61 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio14 3 0.007804 62 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio15 3 0.012962 63 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio18 3 0.001325 64 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM bio19 3 0.079726 65 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM Veget 3 0.023247 66 myRespName_PA1_RUN1_GBM PA1 RUN1 GBM lul 3 0.026839 67 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio3 1 0.343686 68 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio4 1 0.324865 69 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio7 1 0.000000 70 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio9 1 0.956755 71 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio13 1 0.105302 72 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio14 1 0.024868 73 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio15 1 0.010209 74 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio18 1 0.084236 75 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio19 1 0.040388 76 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM Veget 1 0.206295 77 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM lul 1 0.209046 78 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio3 2 0.342289 79 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio4 2 0.338444 80 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio7 2 0.000000 81 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio9 2 0.953093 82 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio13 2 0.109094 83 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio14 2 0.026222 84 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio15 2 0.009977 85 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio18 2 0.087605 86 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio19 2 0.038597 87 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM Veget 2 0.217938 88 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM lul 2 0.207901 89 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio3 3 0.338956 90 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio4 3 0.340435 91 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio7 3 0.000000 92 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio9 3 0.940795 93 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio13 3 0.107913 94 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio14 3 0.024038 95 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio15 3 0.009937 96 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio18 3 0.086781 97 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM bio19 3 0.037752 98 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM Veget 3 0.220358 99 myRespName_PA1_RUN1_GAM PA1 RUN1 GAM lul 3 0.206976 100 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio3 1 0.000000 101 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio4 1 0.000000 102 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio7 1 0.000000 103 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio9 1 0.872929 104 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio13 1 0.000000 105 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio14 1 0.000000 106 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio15 1 0.000000 107 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio18 1 0.000000 108 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio19 1 0.000000 109 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA Veget 1 0.155144 110 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA lul 1 0.000000 111 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio3 2 0.000000 112 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio4 2 0.000000 113 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio7 2 0.000000 114 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio9 2 0.853400 115 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio13 2 0.000000 116 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio14 2 0.000000 117 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio15 2 0.000000 118 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio18 2 0.000000 119 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio19 2 0.000000 120 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA Veget 2 0.154667 121 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA lul 2 0.000000 122 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio3 3 0.000000 123 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio4 3 0.000000 124 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio7 3 0.000000 125 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio9 3 0.867397 126 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio13 3 0.000000 127 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio14 3 0.000000 128 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio15 3 0.000000 129 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio18 3 0.000000 130 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA bio19 3 0.000000 131 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA Veget 3 0.155620 132 myRespName_PA1_RUN1_CTA PA1 RUN1 CTA lul 3 0.000000 133 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio3 1 0.143829 134 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio4 1 0.490492 135 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio7 1 0.289941 136 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio9 1 0.823774 137 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio13 1 0.604299 138 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio14 1 0.461422 139 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio15 1 0.368784 140 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio18 1 0.462865 141 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN bio19 1 0.539963 142 myRespName_PA1_RUN1_ANN PA1 RUN1 ANN Veget 1 0.139934 [ reached 'max' / getOption("max.print") -- omitted 716 rows ]

MayaGueguen commented 10 months ago

Hello Mulatu,

The importance of variable calculated through the BIOMOD_Modeling and bm_VariablesImportance functions is computed through the same method used within the randomForest package :

For each variable to be evaluated :

  1. it shuffles the original variable values
  2. computes model prediction with this shuffled variable and the other explanatory variables un-modified
  3. calculates Pearson's correlation between reference and shuffled predictions
  4. returns score as 1 - cor

The highest the value, the less reference and shuffled predictions are correlated, and the more influence the variable has on the model. A value of 0 assumes no influence of the variable on the model.

Note that this calculation does not account for variables' interactions.

Hence, the score does not represent how much the variable explains the variability contained within the data, but rather the impact it has on predictions. And scores do not sum to 1 between variables. But it is comparable between algorithms.

So it means that in your case, bio9, if shuffled, causes a change in your predictions that only relates to original predictions by 20% of correlation.

Is this clearer ? :eyes:

Maya

mula-jpg commented 10 months ago

Dear Maya, Thank you your help. But I am not understand well. 1) How do I know whether it Shuffled or not? Even the word Shuffle. 2) Is my work acceptable?? 3) Please make more clarify the concept: So it means that in your case, bio9, if shuffled, causes a change in your predictions that only relates to original predictions by 20% of correlation. Best Mulatu

MayaGueguen commented 10 months ago

Dear Mulatu,

shuffle means that all the values of one variable are randomly mixed between sites.

For example, if you have a model with 3 variables : temperature T, precipitations P and slope S.

What it is going to do, within the BIOMOD_Modeling function :

Hope it is clearer that way,

Maya

mula-jpg commented 10 months ago

Dear Maya, Thank you very much. I wonder your effort. I understand the case now. but how I can cite such explanation in my manuscript. Do have any reference? Or I can say Personal communication? best Mulatu

wthuiller commented 9 months ago

Hi there, This is a common way to extract variable importance and as a Maya explained, this is what is also used in RandomForest, and other algorithms like SHAP or others. Just write: To extract variable importance, original predictions are compared to predictions where a single variable at a time was reshuffled. Comparisons are made with Pearson' correlation coefficient and standardized between variables (Thuiller et al. 2009 Ecography) Best Wilfried