Open jacksonwr5798 opened 2 days ago
Hello Jackson 👋
Thank you for posting with lot of details and informations 🙏
Here is a little recap of what's going on when doing ensemble and evaluating them :
EMmean
strategy, it means that it will take all your predictions of the (let's say) 5 single models that fullfill the threshold condition, compute in each pixel the average of these 5 models predictions. So you will get a new map of predictions. And you will use this map to compute again your evaluation metrics and find a new threshold to convert predictions to 0/1 values optimizing the evaluation metrics. What you merge are the predictions, not the evaluation values, which are re-calculated based on the merged predictions. However, the range of predictions of single models, and the optimized cutoff might have impact on the ensemble cutoff, as discussed below.all
single models together, it means that it cannot keep any longer the splitting between PA and CV datasets : it will take all occurrences, and all points that are included within at least one PA dataset. The model found out that if transforming all predictions > 105 into 1, you manage to predict all your occurrences and absences (PA). As some of your kept single models have cutoff quite low (around 300), it means that in order to match their good evaluation metric, they considered as presences predictions that were quite low. Hence, when doing the average, forcing the ensemble to "move" its optimal cutoff downwards.Hope it helps, and please do not hesitate if some things still need to be clarified 👀
Maya
Thank you for getting back so quickly! Should I be concerned that the cutoffs are so low? With a cutoff that low, my initial thought is that future projections are predicting widespread species distribution when that may not be the case. Does a lower cutoff indicate greater uncertainty in the models? I used this code/method with a few other species and got varying cutoff values for the ensemble (100-700), so I was just curious if that is what was driving some of this.
I am going to rerun a few things based on some of the parameters that you discussed above. I may have additional follow up questions but this was very helpful!
Jackson
I was wondering if you could provide further explanation of the calculation for the cutoff values generated for the ensemble models. I saw the explanation in a previous issue for the calculation of the cutoff values for each individual model, but it does not explain how the value is calculated for the ensemble model. I have attached csv files with the output for each individual model as well as the cutoff values for the ensemble. Based on the kept models, I would have expected the cutoff values for the ensemble to higher. Any information or guidance on this topic would be greatly appreciated.
I am happy to provide further data, code, and output if that would be helpful to answering my question.
Thanks, Jackson cutoff_20240919.csv cutoff_ensemble_20240916.csv ensemble_kept_20240919.csv
cutoff_20240919.csv cutoff_ensemble_20240916.csv ensemble_kept_20240919.csv