biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
89 stars 22 forks source link

Model weights #403

Open eragrostis opened 10 months ago

eragrostis commented 10 months ago

Aloha BIOMOD2 community, first of all, I wanted to thank the developers for the work put into developing BIOMOD2, it has been an incredibly useful tool for my work. 

Recently I have been experimenting with model weights and noticed that in the documentation it says  If pseudo-absences have been generated (PA.nb.rep > 0 in BIOMOD_FormatingData), weights are by default calculated such that prevalence = 0.5. Automatically created weights will be integer values to prevent some modeling issues.

Do you therefore recommend that manually generated weights also be integer values? How problematic are decimal weights between 0-1? Do decimal weights only behave badly with specific models? If you have any resources besides for what I could find in the documentation, that would be very useful.

I also noticed looking at the source code that weights are not passed to RF & MAXENT models as the models do not support weights. I highly recommend mentioning in the documentation that these (and perhaps other models, those were the only ones that interested me enough to check the source code) will ignore any weights passed to them.

I have been doing some experiments with weighted models in the context of modeling incipient invasive species where there are relatively few points in Hawai'i (ranging from 6-50 local points and the species has clearly not saturated its niche). I have been using GLM, GAM, GBM, and MARS models specifically with the goal of making an ensemble of these. To do this I first created a model for the whole world using GBIF data, and used the inverse logistic transformation on the projection of that global model on Hawai'i to weight the pseudoabsences of a second local model (following the methodology here https://onlinelibrary.wiley.com/doi/full/10.1111/j.1466-8238.2012.00768.x ). It seems that regardless of what weights I use (I experimented with decimal weights, integer weights, and various prevalences of the weights between presence and PA points) that weighting the PAs of a poor local model does not substantially improve it. I have now been experimenting with outright removing PAs with low weights and that seems to have been improving my models moreso than using weights. Using model weights is appealing to me, but perhaps it is not the right tool for this job. 

I greatly appreciate any suggestions on if model weights are even appropriate in this case, and if so, what are the advantages of integer vs decimal weights. Kevin Faccenda

MayaGueguen commented 4 months ago

Hello Kevin,

I apologize for the lack of response. :pray:

I must say that I delayed providing an answer to your questioning, as I was not confident anyway in the answer. I'm not familiar with the use of weights within the models, and not sure about the impact decimal weights would have in general, and on specific models... But that's no excuse, I should have answered right away that I could not give you a satisfying answer. So once again, please accept my apologies. :cherry_blossom:

Following your suggestion, I added a note in BIOMOD_Modeling documentation mentioning that MAXENT, MAXNET, RF, RFd and SRE do not take weights into account.

Note also that we try to document as much as possible our website, with tutorials, functions' documentation and examples. We have many issues per month, and though we close them once they are answered or solved, you can still search them to try and find questions and problems similar to yours. Finally, you might find it better appropriate to use the Discussions section for general questions or topics such as this one, and to try and get answers from other community people.

Maya