biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
87 stars 22 forks source link

Parallel processing on Maxent tuning: using all cores #423

Closed haglad closed 6 months ago

haglad commented 8 months ago

Hello Biomod team! I have been trying to tune my maxent models, following guidance from a previous issue in here. However, each time i try to tune my model, my computer begins to run in parallel on all 16/16 cores of my computer. Is there any way to reduce the number of cores ENMevaluate uses when tuning? Or, to turn off parallel processing entirely?

If I continue to run on all 16 cores, it causes my computer to crash. :/

Thanks in advance for any help/advice!

myBiomodOptions_base <- bm_ModelingOptions(
                          data.type = "binary",
                          models = c("RF", "MAXENT"),
                          strategy = "default",
                          bm.format = myBiomodData_disk_pal,
                          calib.lines = cv.k)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Modeling Options -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

    >  RF options (datatype: binary , package: randomForest , function: randomForest )...
    >  MAXENT options (datatype: binary , package: MAXENT , function: MAXENT )...

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
### tune parameters for MAXENT
tuned.maxent <- bm_Tuning(
  model = "MAXENT", 
  tuning.fun = "ENMevaluate", 
  bm.options = myBiomodOptions_base@options$MAXENT.binary.MAXENT.MAXENT,
  bm.format = myBiomodData_disk_pal,
  calib.lines = cv.k,
  metric.eval = "TSS",

  )
> Dataset _PA1_RUN1
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Model evaluations with random 10-fold cross validation...

*** Running ENMeval v2.0.4 with maxnet from maxnet package v0.1.4 ***

  |                                                                                                        |   0%
Of 16 total cores using 16...
Running in parallel using doSNOW...
> packageDescription("biomod2")
Package: biomod2
Type: Package
Title: Ensemble Platform for Species Distribution Modeling
Version: 4.2-5
Date: 2023-09-12
> packageDescription("ENMeval")
Package: ENMeval
Type: Package
Title: Automated Tuning and Evaluations of Ecological Niche Models
Version: 2.0.4
Date: 2023-01-06]
> R.version
               _                                
platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          3.2                              
year           2023                             
month          10                               
day            31                               
svn rev        85441                            
language       R                                
version.string R version 4.3.2 (2023-10-31 ucrt)
nickname       Eye Holes  
MayaGueguen commented 8 months ago

Hello Haley,

Actually, we just had some issues and questions lately with the tuning of MAXENT, which led to an additional parameter within the bm_Tuning function allowing to disable the parallelisation.

You can see it here : https://github.com/biomodhub/biomod2/issues/415#issuecomment-1951952447 You will need to update your biomod2 version from the github first.

And then it will look like that :

### tune parameters for MAXENT
tuned.maxent <- bm_Tuning(
  model = "MAXENT", 
  tuning.fun = "ENMevaluate", 
  bm.options = opt.d@options$MAXENT.binary.MAXENT.MAXENT,
  bm.format = myBiomodData_disk_pal,
  calib.lines = cv.k,
  metric.eval = "TSS",
  params.train = list(MAXENT.parallel = FALSE)
 )

:warning: The tuning seems to be not completely stable yet, so do not hesitate if you have further questions / problems.

Maya

haglad commented 8 months ago

Thank you so much Maya!!!!! I'll give that a go now :-)

haglad commented 8 months ago

Hi Maya,

While it seemed to help turn parallelisation, It looks like I'm getting a similar issue to the previous poster, which I'm guessing is what you referenced in the tuning not being entirely stable yet. Posting the result here just to let you know. Thanks so much again!

> tuned.maxent <- bm_Tuning(
+   model = "MAXENT", 
+   tuning.fun = "ENMevaluate", 
+   bm.options = opt.d@options$MAXENT.binary.MAXENT.MAXENT,
+   bm.format = myBiomodData_disk_pal,
+   calib.lines = cv.k,
+   metric.eval = "auc.val.avg",
+   params.train = list(MAXENT.parallel = FALSE)
+ )

        > Dataset _PA1_RUN1
            > Tuning parameters...
        > Dataset _PA1_RUN2
            > Tuning parameters...
        > Dataset _PA1_RUN3
            > Tuning parameters...
        > Dataset _PA1_RUN4
            > Tuning parameters...
        > Dataset _PA1_RUN5
            > Tuning parameters...
        > Dataset _PA1_RUN6
            > Tuning parameters...
        > Dataset _PA1_RUN7
            > Tuning parameters...
        > Dataset _PA1_RUN8
            > Tuning parameters...
        > Dataset _PA1_RUN9
            > Tuning parameters...
        > Dataset _PA1_RUN10
            > Tuning parameters...
        > Dataset _PA2_RUN1
            > Tuning parameters...
        > Dataset _PA2_RUN2
            > Tuning parameters...
        > Dataset _PA2_RUN3
            > Tuning parameters...
        > Dataset _PA2_RUN4
            > Tuning parameters...
        > Dataset _PA2_RUN5
            > Tuning parameters...
        > Dataset _PA2_RUN6
            > Tuning parameters...
        > Dataset _PA2_RUN7
            > Tuning parameters...
        > Dataset _PA2_RUN8
            > Tuning parameters...
        > Dataset _PA2_RUN9
            > Tuning parameters...
        > Dataset _PA2_RUN10
            > Tuning parameters...
        > Dataset _PA3_RUN1
            > Tuning parameters...
        > Dataset _PA3_RUN2
            > Tuning parameters...
        > Dataset _PA3_RUN3
            > Tuning parameters...
        > Dataset _PA3_RUN4
            > Tuning parameters...
        > Dataset _PA3_RUN5
            > Tuning parameters...
        > Dataset _PA3_RUN6
            > Tuning parameters...
        > Dataset _PA3_RUN7
            > Tuning parameters...
        > Dataset _PA3_RUN8
            > Tuning parameters...
        > Dataset _PA3_RUN9
            > Tuning parameters...
        > Dataset _PA3_RUN10
            > Tuning parameters...
        > Dataset _PA4_RUN1
            > Tuning parameters...
        > Dataset _PA4_RUN2
            > Tuning parameters...
        > Dataset _PA4_RUN3
            > Tuning parameters...
        > Dataset _PA4_RUN4
            > Tuning parameters...
        > Dataset _PA4_RUN5
            > Tuning parameters...
        > Dataset _PA4_RUN6
            > Tuning parameters...
        > Dataset _PA4_RUN7
            > Tuning parameters...
        > Dataset _PA4_RUN8
            > Tuning parameters...
        > Dataset _PA4_RUN9
            > Tuning parameters...
        > Dataset _PA4_RUN10
            > Tuning parameters...
        > Dataset _PA5_RUN1
            > Tuning parameters...
        > Dataset _PA5_RUN2
            > Tuning parameters...
        > Dataset _PA5_RUN3
            > Tuning parameters...
        > Dataset _PA5_RUN4
            > Tuning parameters...
        > Dataset _PA5_RUN5
            > Tuning parameters...
        > Dataset _PA5_RUN6
            > Tuning parameters...
        > Dataset _PA5_RUN7
            > Tuning parameters...
        > Dataset _PA5_RUN8
            > Tuning parameters...
        > Dataset _PA5_RUN9
            > Tuning parameters...
        > Dataset _PA5_RUN10
            > Tuning parameters...
        > Dataset _PA6_RUN1
            > Tuning parameters...
        > Dataset _PA6_RUN2
            > Tuning parameters...
        > Dataset _PA6_RUN3
            > Tuning parameters...
        > Dataset _PA6_RUN4
            > Tuning parameters...
        > Dataset _PA6_RUN5
            > Tuning parameters...
        > Dataset _PA6_RUN6
            > Tuning parameters...
        > Dataset _PA6_RUN7
            > Tuning parameters...
        > Dataset _PA6_RUN8
            > Tuning parameters...
        > Dataset _PA6_RUN9
            > Tuning parameters...
        > Dataset _PA6_RUN10
            > Tuning parameters...
        > Dataset _PA7_RUN1
            > Tuning parameters...
        > Dataset _PA7_RUN2
            > Tuning parameters...
        > Dataset _PA7_RUN3
            > Tuning parameters...
        > Dataset _PA7_RUN4
            > Tuning parameters...
        > Dataset _PA7_RUN5
            > Tuning parameters...
        > Dataset _PA7_RUN6
            > Tuning parameters...
        > Dataset _PA7_RUN7
            > Tuning parameters...
        > Dataset _PA7_RUN8
            > Tuning parameters...
        > Dataset _PA7_RUN9
            > Tuning parameters...
        > Dataset _PA7_RUN10
            > Tuning parameters...
        > Dataset _PA8_RUN1
            > Tuning parameters...
        > Dataset _PA8_RUN2
            > Tuning parameters...
        > Dataset _PA8_RUN3
            > Tuning parameters...
        > Dataset _PA8_RUN4
            > Tuning parameters...
        > Dataset _PA8_RUN5
            > Tuning parameters...
        > Dataset _PA8_RUN6
            > Tuning parameters...
        > Dataset _PA8_RUN7
            > Tuning parameters...
        > Dataset _PA8_RUN8
            > Tuning parameters...
        > Dataset _PA8_RUN9
            > Tuning parameters...
        > Dataset _PA8_RUN10
            > Tuning parameters...
        > Dataset _PA9_RUN1
            > Tuning parameters...
        > Dataset _PA9_RUN2
            > Tuning parameters...
        > Dataset _PA9_RUN3
            > Tuning parameters...
        > Dataset _PA9_RUN4
            > Tuning parameters...
        > Dataset _PA9_RUN5
            > Tuning parameters...
        > Dataset _PA9_RUN6
            > Tuning parameters...
        > Dataset _PA9_RUN7
            > Tuning parameters...
        > Dataset _PA9_RUN8
            > Tuning parameters...
        > Dataset _PA9_RUN9
            > Tuning parameters...
        > Dataset _PA9_RUN10
            > Tuning parameters...
        > Dataset _PA10_RUN1
            > Tuning parameters...
        > Dataset _PA10_RUN2
            > Tuning parameters...
        > Dataset _PA10_RUN3
            > Tuning parameters...
        > Dataset _PA10_RUN4
            > Tuning parameters...
        > Dataset _PA10_RUN5
            > Tuning parameters...
        > Dataset _PA10_RUN6
            > Tuning parameters...
        > Dataset _PA10_RUN7
            > Tuning parameters...
        > Dataset _PA10_RUN8
            > Tuning parameters...
        > Dataset _PA10_RUN9
            > Tuning parameters...
        > Dataset _PA10_RUN10
            > Tuning parameters...Error in { : task 1 failed - "argument is of length zero"
MayaGueguen commented 8 months ago

Indeed, I did a mistake in the code in how to retrieve params.train values. I just pushed a correction onto the github version. Could you try and reinstall biomod2 to test ? :pray:

haglad commented 8 months ago

Hi Maya, Thanks so much! I just reinstalled biomod2 and ran my code again. It continued on with the a new but repeating error message for each dataset.

R version 4.3.2 (2023-10-31 ucrt) -- "Eye Holes"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Workspace loaded from ~/.RData]

> devtools::install_github("biomodhub/biomod2", dependencies = TRUE)
Downloading GitHub repo biomodhub/biomod2@HEAD
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

 1: All                                 
 2: CRAN packages only                  
 3: None                                
 4: rlang      (1.1.1  -> 1.1.3 ) [CRAN]
 5: glue       (1.6.2  -> 1.7.0 ) [CRAN]
 6: cli        (3.6.1  -> 3.6.2 ) [CRAN]
 7: vctrs      (0.6.2  -> 0.6.5 ) [CRAN]
 8: stringi    (1.7.12 -> 1.8.3 ) [CRAN]
 9: Rcpp       (1.0.10 -> 1.0.12) [CRAN]
10: curl       (5.1.0  -> 5.2.0 ) [CRAN]
11: utf8       (1.2.3  -> 1.2.4 ) [CRAN]
12: fansi      (1.0.5  -> 1.0.6 ) [CRAN]
13: purrr      (1.0.1  -> 1.0.2 ) [CRAN]
14: dplyr      (1.1.3  -> 1.1.4 ) [CRAN]
15: callr      (3.7.3  -> 3.7.5 ) [CRAN]
16: digest     (0.6.33 -> 0.6.34) [CRAN]
17: sp         (1.6-0  -> 2.1-3 ) [CRAN]
18: timechange (0.2.0  -> 0.3.0 ) [CRAN]
19: later      (1.3.1  -> 1.3.2 ) [CRAN]
20: httpuv     (1.6.11 -> 1.6.14) [CRAN]
21: terra      (1.7-55 -> 1.7-71) [CRAN]

Enter one or more numbers, or an empty line to skip updates: 3
── R CMD build ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔  checking for file 'C:\Users\haley\AppData\Local\Temp\RtmpEfBnHR\remotes4988bf924b6\biomodhub-biomod2-2de3c87/DESCRIPTION' ...
─  preparing 'biomod2': (1.7s)
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building 'biomod2_4.2-5.tar.gz'

Installing package into ‘C:/Users/haley/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package 'biomod2' ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading

Checking Models arguments...
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (biomod2)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#
> # BIOMOD FORMATTTING ----
> #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#
> palembanica$occ <- (1)
> sp <- terra::vect(palembanica, geom =c("long", "lat"), crs = '+proj=longlat +datum=WGS84')
> ## Format biomod data ---
> ?BIOMOD_FormatingData
> myBiomodData_disk_pal <- biomod2::BIOMOD_FormatingData(resp.var=sp, 
+                                                    expl.var=myExpl.pal, 
+                                                    resp.xy=sp, 
+                                                    resp.name= "palembanica_vars_jk",
+                                                    PA.strategy = 'disk',
+                                                    PA.dist.min = 100000,
+                                                    PA.dist.max = NULL,
+                                                    PA.nb.rep = 10,
+                                                    PA.nb.absences = 53,
+                                                    na.rm = T)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= palembanica_vars_jk Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

      ! Response variable name was converted into palembanica.vars.jk
      ! XY coordinates of response variable will be ignored because spatial response object is given.
      ! No data has been set aside for modeling evaluation
      ! No data has been set aside for modeling evaluation

Checking Pseudo-absence selection arguments...

      ! No data has been set aside for modeling evaluation
   > Disk pseudo absences selection
                                          natory variables
   > random pseudo absences selection
   > Pseudo absences are selected in explanatory variables

      ! No data has been set aside for modeling evaluation
      ! No data has been set aside for modeling evaluation
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> ## CV method ---
> # k-fold selection
> cv.k <- bm_CrossValidation(bm.format = myBiomodData_disk_pal,
+                            strategy = "kfold",
+                            nb.rep = 2,
+                            k = 5)

Checking Cross-Validation arguments...

   > k-fold cross-validation selection
> # default parameters
> opt.d <- bm_ModelingOptions(data.type = 'binary',
+                             models = c("RF", "MAXENT"),
+                             strategy = 'default')

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Modeling Options -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

    >  RF options (datatype: binary , package: randomForest , function: randomForest )...
    >  MAXENT options (datatype: binary , package: MAXENT , function: MAXENT )...

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> ### tune parameters for MAXENT
> tuned.maxent <- bm_Tuning(
+   model = "MAXENT", 
+   tuning.fun = "ENMevaluate", 
+   bm.options = opt.d@options$MAXENT.binary.MAXENT.MAXENT,
+   bm.format = myBiomodData_disk_pal,
+   calib.lines = cv.k,
+   metric.eval = "or.mtp.avg",
+   params.train = list(MAXENT.algorithm = "maxent.jar", MAXENT.parallel=F)
+ )

        > Dataset _PA1_RUN1
            > Tuning parameters...*** Running initial checks... ***

Loading required namespace: rJava
* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Removed 42 background points with NA predictor variable values.
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

        > Dataset _PA1_RUN2
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Removed 43 background points with NA predictor variable values.
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

        > Dataset _PA1_RUN3
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Removed 42 background points with NA predictor variable values.
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

        > Dataset _PA1_RUN4
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Removed 43 background points with NA predictor variable values.
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

I then tried params.train = list(MAXENT.algorithm = "maxnet", MAXENT_parallel = F), which resulted in a different error for each dataset: ....

                > Dataset _PA10_RUN8
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Model evaluations with random 10-fold cross validation...

*** Running ENMeval v2.0.4 with maxnet from maxnet package v0.1.4 ***

  |============================================================                                                           |  50%Error in eval(predvars, data, env) : object 'PA1FALSE' not found

        > Dataset _PA10_RUN9
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Model evaluations with random 10-fold cross validation...

*** Running ENMeval v2.0.4 with maxnet from maxnet package v0.1.4 ***

  |============================================================                                                           |  50%Error in eval(predvars, data, env) : object 'PA1FALSE' not found

        > Dataset _PA10_RUN10
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Model evaluations with random 10-fold cross validation...

*** Running ENMeval v2.0.4 with maxnet from maxnet package v0.1.4 ***

  |============================================================                                                           |  50%Error in eval(predvars, data, env) : object 'PA1FALSE' not found
Error in { : task 1 failed - "object 'tune.MAXENT' not found"

Thanks for all your hard and quick work to resolve this!

MayaGueguen commented 8 months ago

Sorry Haley, I did not check properly and missed something... I pushed a correction, would you mind trying again ? :pray:

Be careful to be sure to write properly MAXENT.parallel = FALSE :eyes:

haglad commented 8 months ago

Looks like it worked! 🙌🙌🙌🙌

### tune parameters for MAXENT
tuned.maxent <- bm_Tuning(
  model = "MAXENT", 
  tuning.fun = "ENMevaluate", 
  bm.options = opt.d@options$MAXENT.binary.MAXENT.MAXENT,
  bm.format = myBiomodData_disk_pal,
  calib.lines = cv.k,
  metric.eval = "or.mtp.avg",
  params.train = list(MAXENT.algorithm = "maxnet", MAXENT.parallel=FALSE)
)

> Dataset _PA10_RUN10
            > Tuning parameters...*** Running initial checks... ***

* Variable values were input along with coordinates and not as raster data, so no raster predictions can be generated and AICc is calculated with background data for Maxent models.
* Model evaluations with random 10-fold cross validation...

*** Running ENMeval v2.0.4 with maxnet from maxnet package v0.1.4 ***

  |============================================================================================================| 100%
ENMevaluate completed in 0 minutes 1.2 seconds.

I'm going to tune my other algorithm, and run my model :-)

Thank you so much!