Help with BIOMOD_modeling

rpatin commented 1 year ago

Discussed in https://github.com/biomodhub/biomod2/discussions/280

^{Originally posted by **LindeNozomiLeo** June 20, 2023} Hi, I am trying to run some ensemble models and arrive at the BIOMOD_modeling output with a question. I have set the do.full.models to FALSE, because I want to make sure my models get split and evaluated. However, when looking at the saved files on my hard drive I only see Full models for the different PA datasets and algorithms. It also gives only these files as computed models when looking at the output of the function. How do I make sure the models actually get evaluated the way I want them to? Additionally, I have seen examples online where CV.strategy, CV.nb.rep and CV.perc get added as arguments. Do these crossvalidation arguments provide additional output on the models? Thanks in advance

rpatin commented 1 year ago

Hello @LindeNozomiLeo, If you want your models to get split and evaluated, I suspect you are refering to cross-validation, that is defined with argument CV.strategy, CV.nb.rep and CV.perc that you mentionned. Without using those argument, you will likely only get full models for different PA datasets and algorithms. If you share the code you used, I can point you out what to change in your function calls, although you should be able to find the information on the website. Best regards, Rémi

LindeNozomiLeo commented 1 year ago

Hi @rpatin

i was running the following and though that I understood from the documentation that the datasplit argument should already provide some kind of evaluation of the model:

myBiomodModelOut<- BIOMOD_Modeling(data=myBiomodData,models=c('GLM','GAM','RF','MAXENT.Phillips'),
                              models.options=myBiomodOptions,
                              NbRunEval=4,  
                              Datasplit=80,     
                              VarImport=3,
                              models.eval.meth= c('TSS','ROC'),
                              do.full.models=FALSE, 
                              SaveObj = TRUE,
                              #CV.strategy = 'random', 
                              #CV.nb.rep = 2,
                              #CV.perc = 0.8,
                              Prevalence = 0.5, 
                              modeling.id=modeling.id
                              )

I also ran it with the three CV arguments but I do not see a change in the files on my harddrive. I had a hard time finding recent documentation regarding cross-validation as some of the code on the website still seems to refer to older arguments names etc.

rpatin commented 1 year ago

Hi @LindeNozomiLeo, Which version of biomod2 are you running ? You can get it with sessionInfo() in your current R session. The command that you used should be giving you an error with current biomod2 version (> 4.0.0) Please make sure to update your biomod2 version if you want to use the website material or have support (install.packages('biomod2')). Best, Rémi

LindeNozomiLeo commented 1 year ago

Hi,

Yes I'm seeing right now that I'm using 3.3-15. Thank you! I'll check 4.2-4 out

LindeNozomiLeo commented 1 year ago

Hi, I'm running the following now on the most updated version of biomod2. however I'm getting to an error. I feel like I have all the necessary arguments.

myBiomodModelOut<- BIOMOD_Modeling(bm.format=myBiomodData,
                                   models=c('GLM','GAM','RF','MAXENT'),
                              bm.options=myBiomodOptions,
                              var.import=3, 
                              metric.eval= c('TSS','ROC'),
                              SaveObj = TRUE,
                              CV.strategy = 'random',  
                              CV.nb.rep = 3, 
                              CV.perc = 0.75, 
                              Prevalence = 0.5)

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function '%in%': argument "data" is missing, with no default

rpatin commented 1 year ago

Hi, Could you share a bit more information ?

what is the output of sessionInfo() ?
what is the output of show(myBiomodData) ?

There was likely an issue in your updating of biomod2. The argument SaveObj and Prevalence are outdated and should throw an error. Prevalence is now called prevalence. SaveObj is enabled by default and cannot be set to FALSE.

Thanks in advance, Best, Rémi

LindeNozomiLeo commented 1 year ago

Hi,

I changed/removed the outdates arguments, but it does not change the problem.

these are the outputs:

R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rgeos_0.6-3              speciesgeocodeR_2.0-10   factoextra_1.0.7         ade4_1.7-22             
 [5] lubridate_1.9.2          forcats_1.0.0            stringr_1.5.0            dplyr_1.1.2             
 [9] purrr_1.0.1              readr_2.1.4              tidyr_1.3.0              tibble_3.2.1            
[13] ggplot2_3.4.2            tidyverse_2.0.0          sf_1.0-13                CoordinateCleaner_2.0-20
[17] spThin_0.2.0             knitr_1.42               fields_14.1              viridis_0.6.3           
[21] viridisLite_0.4.2        spam_2.9-1               rnaturalearth_0.3.2      rnaturalearthdata_0.1.0 
[25] terra_1.7-29             biomod2_4.2-4            raster_3.6-20            sp_1.6-0                

loaded via a namespace (and not attached):
  [1] splines_4.3.0          later_1.3.1            pROC_1.18.2            rpart_4.1.19          
  [5] lifecycle_1.0.3        doParallel_1.0.17      processx_3.8.1         lattice_0.21-8        
  [9] MASS_7.3-58.4          backports_1.4.1        magrittr_2.0.3         vcd_1.4-11            
 [13] Hmisc_5.0-1            rmarkdown_2.21         remotes_2.4.2          plotrix_3.8-2         
 [17] httpuv_1.6.11          rgdal_1.6-6            sessioninfo_1.2.2      pkgbuild_1.4.0        
 [21] plotmo_3.6.2           DBI_1.1.3              RColorBrewer_1.1-3     multcomp_1.4-23       
 [25] maps_3.4.1             abind_1.4-5            pkgload_1.3.2          nnet_7.3-18           
 [29] TH.data_1.1-2          sandwich_3.0-2         gbm_2.1.8.1            ggrepel_0.9.3         
 [33] vegan_2.6-4            units_0.8-2            permute_0.9-7          PresenceAbsence_1.1.11
 [37] codetools_0.2-19       dismo_1.3-9            xml2_1.3.3             tidyselect_1.2.0      
 [41] gmp_0.7-1              base64enc_0.1-3        jsonlite_1.8.4         e1071_1.7-13          
 [45] ellipsis_0.3.2         Formula_1.2-5          survival_3.5-5         iterators_1.0.14      
 [49] foreach_1.5.2          tools_4.3.0            Rcpp_1.0.10            glue_1.6.2            
 [53] gridExtra_2.3          mgcv_1.8-42            xfun_0.39              usethis_2.1.6         
 [57] withr_2.5.0            fastmap_1.1.1          latticeExtra_0.6-30    fansi_1.0.4           
 [61] callr_3.7.3            digest_0.6.31          timechange_0.2.0       rasterVis_0.51.5      
 [65] R6_2.5.1               mime_0.12              colorspace_2.1-0       jpeg_0.1-10           
 [69] utf8_1.2.3             generics_0.1.3         hexbin_1.28.3          data.table_1.14.8     
 [73] class_7.3-21           prettyunits_1.1.1      httr_1.4.5             htmlwidgets_1.6.2     
 [77] whisker_0.4.1          pkgconfig_2.0.3        gtable_0.3.3           picante_1.8.2         
 [81] Rmpfr_0.9-2            lmtest_0.9-40          HH_3.1-49              htmltools_0.5.5       
 [85] profvis_0.3.7          dotCall64_1.0-2        scales_1.2.1           leaps_3.1             
 [89] png_0.1-8              rstudioapi_0.14        geosphere_1.5-18       tzdb_0.3.0            
 [93] reshape2_1.4.4         rgbif_3.7.7            nlme_3.1-162           curl_5.0.0            
 [97] checkmate_2.2.0        proxy_0.4-27           cachem_1.0.8           zoo_1.8-12            
[101] KernSmooth_2.23-20     parallel_4.3.0         miniUI_0.1.1.1         foreign_0.8-84        
[105] pillar_1.9.0           reshape_0.8.9          vctrs_0.6.2            urlchecker_1.0.1      
[109] promises_1.2.0.1       randomForest_4.7-1.1   xtable_1.8-4           cluster_2.1.4         
[113] htmlTable_2.4.1        evaluate_0.21          oai_0.4.0              mvtnorm_1.1-3         
[117] cli_3.6.1              compiler_4.3.0         rlang_1.1.1            crayon_1.5.2          
[121] interp_1.1-4           classInt_0.4-9         ps_1.7.5               plyr_1.8.8            
[125] fs_1.6.2               mda_0.5-3              stringi_1.7.12         earth_5.3.2           
[129] deldir_1.0-6           munsell_0.5.0          lazyeval_0.2.2         devtools_2.4.5        
[133] Matrix_1.5-4           hms_1.1.3              shiny_1.7.4            memoise_2.0.1         
[137] TeachingDemos_2.12     ape_5.7-1

-----------------------------------
sp.name =  Mikania.lundiana

     30 presences,  0 true absences and  4615 undifined points in dataset

     4 explanatory variables

 CHELSA_bio17_1981.2010_V.2.1 CHELSA_bio13_1981.2010_V.2.1 CHELSA_bio4_1981.2010_V.2.1
 Min.   :  28.62              Min.   : 886                 Min.   : 714               
 1st Qu.: 245.46              1st Qu.:2185                 1st Qu.:1356               
 Median : 558.61              Median :2639                 Median :1755               
 Mean   :1170.19              Mean   :2591                 Mean   :1873               
 3rd Qu.:1724.32              3rd Qu.:3020                 3rd Qu.:2281               
 Max.   :5560.74              Max.   :4805                 Max.   :3633               
 CHELSA_bio6_1981.2010_V.2.1
 Min.   :2784               
 1st Qu.:2857               
 Median :2881               
 Mean   :2877               
 3rd Qu.:2897               
 Max.   :2959               

 5 Pseudo Absences dataset available ( PA1 PA2 PA3 PA4 PA5 ) with  1000 
absences in each (true abs + pseudo abs)

rpatin commented 1 year ago

Thanks for the additional information :pray: I am investigating the issue.

Can you just confirm that you get the following error when using the argument SaveObj and Prevalence ?

Error in BIOMOD_Modeling(bm.format = myBiomodData, models = c("GLM", "GAM",  :    unused arguments (SaveObj = TRUE, Prevalence = 0.5)

If not you should restart the R session to ensure that the correct biomod2 package is currently being used.

rpatin commented 1 year ago

Also, do you only have the error

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function '%in%': argument "data" is missing, with no default

or does BIOMOD_Modeling have a lot of intermediates output before the error ? If so, can you share the console output ? It will help the investigation.

LindeNozomiLeo commented 1 year ago

Hi, now that was never the error, before and after removal of those arguments I only got the previously mentioned error and as intermediate output it gave me something about checking the models (I don't remember exactly and was already restarting when I saw your message), but now after restarting R the function is running! and not just the full models as was the problem before.

Thanks a lot!

LindeNozomiLeo commented 1 year ago

it does give me this warning:

Warning message: executing %dopar% sequentially: no parallel backend registered

rpatin commented 1 year ago

Happy that the problem is solved :+1:. It was indeed the issue of using the older biomod2 version. The warning is harmless and just indicate that you are running without parallelization. Best, Rémi

LindeNozomiLeo commented 1 year ago

Hi,

Sorry to bother again, but I'm wondering why there is not evaluation for validation and evaluation in this output:

 get_evaluations(myBiomodEM)
                                                     full.name merged.by.PA merged.by.run
1 Mikania.lundiana_EMmeanByTSS_mergedData_mergedRun_mergedAlgo   mergedData     mergedRun
2 Mikania.lundiana_EMmeanByTSS_mergedData_mergedRun_mergedAlgo   mergedData     mergedRun
3 Mikania.lundiana_EMmeanByROC_mergedData_mergedRun_mergedAlgo   mergedData     mergedRun
4 Mikania.lundiana_EMmeanByROC_mergedData_mergedRun_mergedAlgo   mergedData     mergedRun
  merged.by.algo filtered.by   algo metric.eval cutoff sensitivity specificity calibration
1     mergedAlgo         TSS EMmean         TSS  485.0      80.000      85.702       0.658
2     mergedAlgo         TSS EMmean         ROC  444.5      83.333      82.768       0.885
3     mergedAlgo         ROC EMmean         TSS  380.0      90.000      88.505       0.785
4     mergedAlgo         ROC EMmean         ROC  380.5      90.000      88.548       0.947
  validation evaluation
1         NA         NA
2         NA         NA
3         NA         NA
4         NA         NA

rpatin commented 1 year ago

Hi, Here it is expected to have neither evaluation nor validation in your output:

evaluation dataset (independent data used to evaluate the model) was not provided so evaluation is NA both for individual models (BIOMOD_Modeling) and ensemble models (BIOMOD_EnsembleModeling)
validation dataset (obtained with crossvalidation with CV.xx arguments) was provided and the column was flled for your individual models (output of BIOMOD_Modeling). However this cross-validation could not be used to evaluate the ensemble models as you merged all model together (em.by = 'all'), therefore the ensemble model is calibrated using the full range of the data. If you want ensemble model(s) with cross-validation, you then need to merge your model using em.by = PA+run. Each one of your ensemble model will then be evaluated with cross-validation.

Additionally it would be great if you could open new issue for new questions to keep the issues short and specific :pray:

I hope this is clear, if not, feel free to ask additional questions. Best, Rémi

biomodhub / biomod2

Help with BIOMOD_modeling #281

Discussed in https://github.com/biomodhub/biomod2/discussions/280