biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
87 stars 22 forks source link

Error in BIOMOD_ModelingOptions - fail to change the parameter of the biomod2 inner model (RF) #311

Closed Jinyu8579 closed 1 year ago

Jinyu8579 commented 1 year ago

Please make sure to close the issue once you consider it as solved Please use screenshots only when you cannot copy-paste the object, e.g. for figures or maps

Error and context

We wanted to use the biomod2 inner model random forest to perform species distribution modeling on an insect species, based on 300 presence points and 100 background points. We treated the binary (1-presence/0-background) data as a continuous response variable in order to end up with a continuous measure of suitability. Thus, we ran a regression model, and set "do.classif = FALSE" for the parameter of the RF model when running the function "BIOMOD_ModelingOptions". However, we did not see the change of the parameter, it is still the default setting (i.e., do.classif = TRUE).

Code used to get the error Please add the code used to reproduce your error, starting with BIOMOD_FormatingData up to the function that bugged. Please add as well :

  1. the output of show for the different object used or generated.
  2. the console output of the function for which an error occurred.
> # Load species occupancy data
> dataSpecies <-read.csv("E:\\Rconversion\\suitability_connectivity\\en_pre_bg_points\\en_pre_bg1_points.csv")
> # Look at structure of dataSpecies dataframe
> str(dataSpecies)
'data.frame':   401 obs. of  4 variables:
 $ longitude : num  114 113 113 113 113 ...
 $ latitude  : num  24.1 24.3 24.2 24.1 23.2 ...
 $ pre_bg_num: int  1 1 1 1 1 1 1 1 1 1 ...
 $ pre_bg    : chr  "presence" "presence" "presence" "presence" ...
> # Tell biomod2 which parts of the database refer to which biomod2 object
> myRespName <- "Empoasca_onukii"
> myResp <-as.numeric(dataSpecies[,"pre_bg_num"])
> myRespXY <-dataSpecies[,c("longitude","latitude")]
> # Create stack of all environmental covariate layers
> myExpl <- envPlus
> names(myExpl)  
 [1] "bio1"        "bio2"        "bio3"        "bio4"        "bio5"        "bio6"        "bio7"       
 [8] "bio8"        "bio9"        "bio10"       "bio11"       "bio12"       "bio13"       "bio14"      
[15] "bio15"       "bio16"       "bio17"       "bio18"       "bio19"       "elev"        "soil_pH"    
[22] "EvergDecid"  "EvergBroad"  "DecidBroad"  "MixedTree"   "Shrubs"      "HerbVeg"     "CultVeg"    
[29] "FloodVeg"    "Urbab"       "Snow"        "Barren"      "OpenWater"   "Mou_300dens" "riverdens"  
[36] "slope"      
> # Create object (myBiomodData) to contain all the previous objects within it, 
> # formatted correctly
> myBiomodData <-BIOMOD_FormatingData(resp.var = myResp,
+                                     expl.var = myExpl,
+                                     resp.xy = myRespXY,
+                                     resp.name = myRespName,
+                                     PA.nb.rep =0)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Empoasca_onukii Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

      ! Response variable name was converted into Empoasca.onukii
      ! No data has been set aside for modeling evaluation
                                          eling evaluation
      ! No data has been set aside for modeling evaluation
 ! Some NAs have been automatically removed from your data
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> # Set the options that you have chosen for the different model algorithms
> # - here using defaults (empty arguments) 
> myBiomodOptions <- BIOMOD_ModelingOptions(RF = list(do.classif = FALSE,
+                                                    ntree = 500,
+                                                    mtry = 'default',
+                                                    nodesize = 5,
+                                                    maxnodes = NULL))
> myBiomodOptions

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.options -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

GLM = list( type = 'quadratic',
            interaction.level = 0,
            myFormula = NULL,
            test = 'AIC',
            family = binomial(link = 'logit'),
            mustart = 0.5,
            control = glm.control(epsilon = 1e-08, maxit = 50, trace = FALSE) ),

GBM = list( distribution = 'bernoulli',
            n.trees = 2500,
            interaction.depth = 7,
            n.minobsinnode = 5,
            shrinkage = 0.001,
            bag.fraction = 0.5,
            train.fraction = 1,
            cv.folds = 3,
            keep.data = FALSE,
            verbose = FALSE,
            perf.method = 'cv',
            n.cores = 1),

GAM = list( algo = 'GAM_mgcv',
            type = 's_smoother',
            k = -1,
            interaction.level = 0,
            myFormula = NULL,
            family = binomial(link = 'logit'),
            method = 'GCV.Cp', 
            optimizer = c('outer','newton'),
            select = FALSE,
            knots = NULL,
            paraPen = NULL,
            control = list(nthreads = 1, irls.reg = 0, epsilon = 1e-07, maxit = 200, trace = FALSE
, mgcv.tol = 1e-07, mgcv.half = 15, rank.tol = 1.49011611938477e-08
, nlm = list(ndigit=7, gradtol=1e-06, stepmax=2, steptol=1e-04, iterlim=200, check.analyticals=0)
, optim = list(factr=1e+07)
, newton = list(conv.tol=1e-06, maxNstep=5, maxSstep=2, maxHalf=30, use.svd=0), outerPIsteps = 0
, idLinksBases = TRUE, scalePenalty = TRUE, efs.lspmax = 15, efs.tol = 0.1, keepData = FALSE
, scale.est = fletcher, edge.correct = FALSE) ),

CTA = list( method = 'class',
            parms = 'default',
            cost = NULL,
            control = list(xval = 5, minbucket = 5, minsplit = 5, cp = 0.001, maxdepth = 25) ),

ANN = list( NbCV = 5,
            size = NULL,
            decay = NULL,
            rang = 0.1,
            maxit = 200),

SRE = list( quant = 0.025),

FDA = list( method = 'mars',
            add_args = NULL),

MARS = list( type = 'simple',
             interaction.level = 0,
             myFormula = NULL,
             nk = NULL,
             penalty = 2,
             thresh = 0.001,
             nprune = NULL,
             pmethod = 'backward'),

RF = list( do.classif = TRUE,
           ntree = 500,
           mtry = 'default',
           sampsize = NULL,
           nodesize = 5,
           maxnodes = NULL),

MAXENT = list( path_to_maxent.jar = 'E:/Rconversion/suitability_connectivity', 
               memory_allocated = 512,
               initial heap size = NULL,
               maximum heap size = NULL,
               background_data_dir = 'default',
               maximumbackground = 'default',
               maximumiterations = 200,
               visible = FALSE,
               linear = TRUE,
               quadratic = TRUE,
               product = TRUE,
               threshold = TRUE,
               hinge = TRUE,
               lq2lqptthreshold = 80,
               l2lqthreshold = 10,
               hingethreshold = 15,
               beta_threshold = -1,
               beta_categorical = -1,
               beta_lqp = -1,
               beta_hinge = -1,
               betamultiplier = 1,
               defaultprevalence = 0.5),

 MAXNET = list( myFormula = NULL,
     regmult = 1,
     regfun = <function> ),

 XGBOOST = list( max.depth = 5,
                 eta = 0.1,
                 nrounds = 512,
                 objective = binary:logistic,
                 nthread = 1 )
)

myBiomodData <- BIOMOD_FormatingData( ** write arguments here **)
show(myBiomodData)
# paste output here
> myBiomodData

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.formated.data -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

dir.name =  .

sp.name =  Empoasca.onukii

     296 presences,  100 true absences and  0 undefined points in dataset

     36 explanatory variables

      bio1             bio2             bio3            bio4             bio5      
 Min.   :-10.27   Min.   : 5.333   Min.   :19.46   Min.   : 128.5   Min.   : 7.60  
 1st Qu.: 14.54   1st Qu.: 7.598   1st Qu.:26.17   1st Qu.: 639.4   1st Qu.:29.10  
 Median : 16.30   Median : 8.529   Median :28.63   Median : 771.5   Median :30.80  
 Mean   : 15.65   Mean   : 8.838   Mean   :31.13   Mean   : 743.5   Mean   :29.90  
 3rd Qu.: 18.04   3rd Qu.: 9.300   3rd Qu.:33.40   3rd Qu.: 882.8   3rd Qu.:32.73  
 Max.   : 27.62   Max.   :16.208   Max.   :63.58   Max.   :1190.0   Max.   :36.60  
      bio6               bio7            bio8              bio9             bio10      
 Min.   :-26.8000   Min.   :12.70   Min.   :-0.2833   Min.   :-19.033   Min.   :-0.15  
 1st Qu.: -1.6000   1st Qu.:25.80   1st Qu.:21.0000   1st Qu.:  4.129   1st Qu.:23.93  
 Median :  1.0000   Median :29.65   Median :23.2667   Median :  7.217   Median :25.88  
 Mean   :  0.6124   Mean   :29.29   Mean   :22.4620   Mean   :  7.148   Mean   :24.40  
 3rd Qu.:  4.1250   3rd Qu.:33.05   3rd Qu.:25.3750   3rd Qu.: 11.004   3rd Qu.:27.04  
 Max.   : 19.6000   Max.   :48.50   Max.   :28.6667   Max.   : 25.150   Max.   :30.93  
     bio11             bio12          bio13            bio14            bio15            bio16       
 Min.   :-20.417   Min.   :  41   Min.   :  11.0   Min.   :  0.00   Min.   : 33.32   Min.   :  28.0  
 1st Qu.:  4.062   1st Qu.: 977   1st Qu.: 189.0   1st Qu.: 10.00   1st Qu.: 56.61   1st Qu.: 475.8  
 Median :  6.200   Median :1278   Median : 225.5   Median : 22.00   Median : 66.48   Median : 584.5  
 Mean   :  6.135   Mean   :1280   Mean   : 239.4   Mean   : 24.21   Mean   : 72.13   Mean   : 619.3  
 3rd Qu.:  8.892   3rd Qu.:1554   3rd Qu.: 273.2   3rd Qu.: 36.00   3rd Qu.: 87.87   3rd Qu.: 700.0  
 Max.   : 24.833   Max.   :3845   Max.   :1100.0   Max.   :176.00   Max.   :137.37   Max.   :2797.0  
     bio17            bio18            bio19             elev           soil_pH        EvergDecid    
 Min.   :  2.00   Min.   :  28.0   Min.   :  3.00   Min.   :   1.0   Min.   :46.75   Min.   : 0.000  
 1st Qu.: 37.00   1st Qu.: 442.0   1st Qu.: 38.75   1st Qu.: 121.5   1st Qu.:56.69   1st Qu.: 0.000  
 Median : 81.00   Median : 512.0   Median : 83.50   Median : 335.0   Median :60.78   Median : 0.000  
 Mean   : 91.01   Mean   : 540.2   Mean   :103.71   Mean   : 711.2   Mean   :62.50   Mean   : 6.338  
 3rd Qu.:139.25   3rd Qu.: 595.0   3rd Qu.:167.00   3rd Qu.: 800.2   3rd Qu.:66.78   3rd Qu.: 9.000  
 Max.   :557.00   Max.   :2196.0   Max.   :557.00   Max.   :5414.0   Max.   :84.62   Max.   :59.000  
   EvergBroad        DecidBroad       MixedTree         Shrubs          HerbVeg         CultVeg      
 Min.   :  0.000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.000   Min.   : 0.00   Min.   :  0.00  
 1st Qu.:  0.000   1st Qu.: 0.000   1st Qu.: 0.00   1st Qu.: 0.000   1st Qu.: 0.00   1st Qu.: 20.00  
 Median :  0.000   Median : 0.000   Median :10.00   Median : 0.000   Median : 0.00   Median : 43.00  
 Mean   :  5.861   Mean   : 2.015   Mean   :17.16   Mean   : 5.758   Mean   : 5.04   Mean   : 47.42  
 3rd Qu.:  0.000   3rd Qu.: 0.000   3rd Qu.:33.00   3rd Qu.:10.000   3rd Qu.: 0.00   3rd Qu.: 78.00  
 Max.   :100.000   Max.   :60.000   Max.   :80.00   Max.   :39.000   Max.   :92.00   Max.   :100.00  
    FloodVeg          Urbab              Snow              Barren         OpenWater     
 Min.   : 0.000   Min.   :  0.000   Min.   :  0.0000   Min.   :  0.00   Min.   : 0.000  
 1st Qu.: 0.000   1st Qu.:  0.000   1st Qu.:  0.0000   1st Qu.:  0.00   1st Qu.: 0.000  
 Median : 0.000   Median :  0.000   Median :  0.0000   Median :  0.00   Median : 0.000  
 Mean   : 0.404   Mean   :  6.283   Mean   :  0.4546   Mean   :  1.72   Mean   : 1.596  
 3rd Qu.: 0.000   3rd Qu.:  0.000   3rd Qu.:  0.0000   3rd Qu.:  0.00   3rd Qu.: 0.000  
 Max.   :27.000   Max.   :100.000   Max.   :100.0000   Max.   :100.00   Max.   :52.000  
  Mou_300dens          riverdens             slope         
 Min.   :0.0002867   Min.   :0.0002866   Min.   :   7.562  
 1st Qu.:0.0002899   1st Qu.:0.0002897   1st Qu.: 173.672  
 Median :0.0002912   Median :0.0002911   Median : 569.375  
 Mean   :0.0002910   Mean   :0.0002910   Mean   : 802.631  
 3rd Qu.:0.0002925   3rd Qu.:0.0002923   3rd Qu.:1317.672  
 Max.   :0.0002930   Max.   :0.0002930   Max.   :3816.188  

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

show(myExpl) # if using an environment raster
# paste output here
> myExpl
class      : RasterStack 
dimensions : 3000, 4200, 12600000, 36  (nrow, ncol, ncell, nlayers)
resolution : 0.008333333, 0.008333333  (x, y)
extent     : 90, 125, 15, 40  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
names      :          bio1,          bio2,          bio3,          bio4,          bio5,          bio6,          bio7,          bio8,          bio9,         bio10,         bio11,         bio12,         bio13,         bio14,         bio15, ... 
min values : -2.351667e+01,  0.000000e+00,  1.343996e+01,  0.000000e+00, -3.600000e+00, -3.900000e+01,  0.000000e+00, -1.221667e+01, -3.166667e+01, -1.213333e+01, -3.273333e+01,  9.000000e+00,  4.000000e+00,  0.000000e+00,  1.751306e+01, ... 
max values :  2.835833e+01,  2.190000e+01,  8.830645e+01,  1.324749e+03,  3.760000e+01,  2.280000e+01,  5.300000e+01,  2.968333e+01,  2.801667e+01,  3.153333e+01,  2.653333e+01,  9.312000e+03,  2.825000e+03,  2.060000e+02,  1.533400e+02, ... 

myBiomodModelOut <- BIOMOD_Modeling( ** write arguments here **)
show(myBiomodModelOut)
# paste output here
Not run

myBiomodEM <- BIOMOD_EnsembleModeling( ** write arguments here **)
show(myBiomodEM)
# paste output here
Not run

myBiomodProj <- BIOMOD_Projection( ** write arguments here **)
show(myBiomodProj)
# paste output here
Not run

show(myExplFuture) # if projecting on a new environment raster
# paste output here
Not run

myBiomodEMProj <- BIOMOD_EnsembleForecasting( ** write arguments here **)
# paste BIOMOD_EnsembleForecasting console output here with the error
Not run

Environment Information Please paste the output of sessionInfo() in your current R session below.

# paste output of sessionInfo() here
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-90        lattice_0.20-45     ggplot2_3.3.5       raster_3.5-11      
[5] sp_1.4-6            randomForest_4.6-14 biomod2_4.2-4      

loaded via a namespace (and not attached):
 [1] jsonlite_1.7.3        splines_4.1.0         foreach_1.5.1         prodlim_2019.11.13   
 [5] Formula_1.2-4         assertthat_0.2.1      stats4_4.1.0          globals_0.14.0       
 [9] ipred_0.9-12          pillar_1.6.4          glue_1.6.0            pROC_1.18.0          
[13] digest_0.6.29         colorspace_2.0-2      recipes_0.1.17        gbm_2.1.8            
[17] Matrix_1.4-0          plyr_1.8.6            timeDate_3043.102     pkgconfig_2.0.3      
[21] maxnet_0.1.4          PresenceAbsence_1.1.9 earth_5.3.1           listenv_0.8.0        
[25] purrr_0.3.4           scales_1.1.1          terra_1.7-23          gower_0.2.2          
[29] lava_1.6.10           TeachingDemos_2.12    tibble_3.1.6          mgcv_1.8-38          
[33] mda_0.5-2             generics_0.1.1        xgboost_1.7.5.1       ellipsis_0.3.2       
[37] withr_2.4.3           nnet_7.3-16           survival_3.2-13       magrittr_2.0.1       
[41] crayon_1.4.2          fansi_1.0.2           future_1.23.0         parallelly_1.30.0    
[45] nlme_3.1-153          MASS_7.3-54           class_7.3-19          tools_4.1.0          
[49] dismo_1.3-5           data.table_1.14.2     lifecycle_1.0.1       stringr_1.4.0        
[53] munsell_0.5.0         plotrix_3.8-2         compiler_4.1.0        rlang_0.4.12         
[57] plotmo_3.6.1          grid_4.1.0            iterators_1.0.13      rstudioapi_0.13      
[61] gtable_0.3.0          ModelMetrics_1.2.2.2  codetools_0.2-18      abind_1.4-5          
[65] DBI_1.1.2             reshape_0.8.8         reshape2_1.4.4        R6_2.5.1             
[69] lubridate_1.8.0       dplyr_1.0.7           future.apply_1.8.1    utf8_1.2.2           
[73] stringi_1.7.6         parallel_4.1.0        Rcpp_1.0.11           vctrs_0.3.8          
[77] rpart_4.1-15          tidyselect_1.1.1  

Additional information If you have any additional information or context you can add it here.

rpatin commented 1 year ago

Dear @Jinyu8579, Thank you for reporting and posting a nicely formatted issue :pray: Indeed the argument was ignored in the code of BIOMOD_ModelingOptions. I just pushed a commit to correct that. If you update to current github version with devtools::install_github('biomodhub/biomod2'), this should hopefully work. If not, please let us know. Best, Rémi

Jinyu8579 commented 1 year ago

Dear Rémi,

Thanks for your feedback. We have updated to current github version with devtools::install_github('biomodhub/biomod2'), and it works well now. However, we ran into another roadblock with the remaining code:

RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows, : 'prob' or 'vote' not meaningful for regression In addition: Warning message: In randomForest.default(m, y, ...) : The response has five or fewer unique values. Are you sure you want to do regression?

We have found similar error posted on Stack Exchange (https://stats.stackexchange.com/questions/519102/random-forest-error-type-of-predictors-in-new-data-do-not-match-training-set). The solution they reported should be as following: You fitted a regression forest so you get out of it predictions of the response variable, not a probability. You need type = "response" (which is the default). If you meant to fit a classification forest then you need to look again at how you fitted the model, but reading your code, it looks like you a modelling the count of bike rentals, so the probability prediction doesn't make sense; you just want the predicted count (predicted response).

Thus, our roadblock may also be settled by setting 'type = "response"'. But we have not found the relevant argument in either BIOMOD_ModelingOptions or BIOMOD_Modeling.

ps. Code used to get the error

> # Load species occupancy data
> dataSpecies <-read.csv("E:\\Rconversion\\suitability_connectivity\\en_pre_bg_points\\en_pre_bg11_points.csv")
> # Look at structure of dataSpecies dataframe
> str(dataSpecies)
'data.frame':   401 obs. of  4 variables:
 $ longitude : num  114 113 113 113 113 ...
 $ latitude  : num  24.1 24.3 24.2 24.1 23.2 ...
 $ pre_bg_num: int  1 2 3 4 5 6 7 8 9 10 ...
 $ pre_bg    : chr  "presence" "presence" "presence" "presence" ...
> # Tell biomod2 which parts of the database refer to which biomod2 object
> myRespName <- "Empoasca_onukii"
> myResp <-as.numeric(dataSpecies[,"pre_bg_num"])
> myRespXY <-dataSpecies[,c("longitude","latitude")]
> # Load species occupancy data
> dataSpecies <-read.csv("E:\\Rconversion\\suitability_connectivity\\en_pre_bg_points\\en_pre_bg12_points.csv")
> # Look at structure of dataSpecies dataframe
> str(dataSpecies)
'data.frame':   401 obs. of  4 variables:
 $ longitude : num  114 113 113 113 113 ...
 $ latitude  : num  24.1 24.3 24.2 24.1 23.2 ...
 $ pre_bg_num: int  1 1 1 1 1 1 1 1 1 1 ...
 $ pre_bg    : chr  "presence" "presence" "presence" "presence" ...
> # Tell biomod2 which parts of the database refer to which biomod2 object
> myRespName <- "Empoasca_onukii"
> myResp <-as.numeric(dataSpecies[,"pre_bg_num"])
> myRespXY <-dataSpecies[,c("longitude","latitude")]
> # Create stack of all environmental covariate layers
> myExpl <- envPlus
> names(myExpl)  
 [1] "bio1"        "bio2"        "bio3"        "bio4"        "bio5"        "bio6"        "bio7"       
 [8] "bio8"        "bio9"        "bio10"       "bio11"       "bio12"       "bio13"       "bio14"      
[15] "bio15"       "bio16"       "bio17"       "bio18"       "bio19"       "elev"        "soil_pH"    
[22] "EvergDecid"  "EvergBroad"  "DecidBroad"  "MixedTree"   "Shrubs"      "HerbVeg"     "CultVeg"    
[29] "FloodVeg"    "Urbab"       "Snow"        "Barren"      "OpenWater"   "Mou_300dens" "riverdens"  
[36] "slope"      
> # Create object (myBiomodData) to contain all the previous objects within it, 
> # formatted correctly
> myBiomodData <-BIOMOD_FormatingData(resp.var = myResp,
+                                     expl.var = myExpl,
+                                     resp.xy = myRespXY,
+                                     resp.name = myRespName,
+                                     PA.nb.rep =0)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Empoasca_onukii Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

      ! Response variable name was converted into Empoasca.onukii
      !   Response variable have non-binary values that will be converted into 0 (resp <=0) or 1 (resp > 0).
      ! No data has been set aside for modeling evaluation
                                          eling evaluation
      ! No data has been set aside for modeling evaluation
 ! Some NAs have been automatically removed from your data
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> # Set the options that you have chosen for the different model algorithms
> # - here using defaults (empty arguments) 
> myBiomodOptions <- BIOMOD_ModelingOptions(RF = list(do.classif = FALSE,
+                                                    ntree = 500,
+                                                    mtry = 'default',
+                                                    nodesize = 5,
+                                                    maxnodes = NULL))
> myBiomodOptions

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.options -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

GLM = list( type = 'quadratic',
            interaction.level = 0,
            myFormula = NULL,
            test = 'AIC',
            family = binomial(link = 'logit'),
            mustart = 0.5,
            control = glm.control(epsilon = 1e-08, maxit = 50, trace = FALSE) ),

GBM = list( distribution = 'bernoulli',
            n.trees = 2500,
            interaction.depth = 7,
            n.minobsinnode = 5,
            shrinkage = 0.001,
            bag.fraction = 0.5,
            train.fraction = 1,
            cv.folds = 3,
            keep.data = FALSE,
            verbose = FALSE,
            perf.method = 'cv',
            n.cores = 1),

GAM = list( algo = 'GAM_mgcv',
            type = 's_smoother',
            k = -1,
            interaction.level = 0,
            myFormula = NULL,
            family = binomial(link = 'logit'),
            method = 'GCV.Cp', 
            optimizer = c('outer','newton'),
            select = FALSE,
            knots = NULL,
            paraPen = NULL,
            control = list(nthreads = 1, irls.reg = 0, epsilon = 1e-07, maxit = 200, trace = FALSE
, mgcv.tol = 1e-07, mgcv.half = 15, rank.tol = 1.49011611938477e-08
, nlm = list(ndigit=7, gradtol=1e-06, stepmax=2, steptol=1e-04, iterlim=200, check.analyticals=0)
, optim = list(factr=1e+07)
, newton = list(conv.tol=1e-06, maxNstep=5, maxSstep=2, maxHalf=30, use.svd=0), outerPIsteps = 0
, idLinksBases = TRUE, scalePenalty = TRUE, efs.lspmax = 15, efs.tol = 0.1, keepData = FALSE
, scale.est = fletcher, edge.correct = FALSE) ),

CTA = list( method = 'class',
            parms = 'default',
            cost = NULL,
            control = list(xval = 5, minbucket = 5, minsplit = 5, cp = 0.001, maxdepth = 25) ),

ANN = list( NbCV = 5,
            size = NULL,
            decay = NULL,
            rang = 0.1,
            maxit = 200),

SRE = list( quant = 0.025),

FDA = list( method = 'mars',
            add_args = NULL),

MARS = list( type = 'simple',
             interaction.level = 0,
             myFormula = NULL,
             nk = NULL,
             penalty = 2,
             thresh = 0.001,
             nprune = NULL,
             pmethod = 'backward'),

RF = list( do.classif = FALSE,
           ntree = 500,
           mtry = 'default',
           sampsize = NULL,
           nodesize = 5,
           maxnodes = NULL),

MAXENT = list( path_to_maxent.jar = 'E:/Rconversion/suitability_connectivity', 
               memory_allocated = 512,
               initial heap size = NULL,
               maximum heap size = NULL,
               background_data_dir = 'default',
               maximumbackground = 'default',
               maximumiterations = 200,
               visible = FALSE,
               linear = TRUE,
               quadratic = TRUE,
               product = TRUE,
               threshold = TRUE,
               hinge = TRUE,
               lq2lqptthreshold = 80,
               l2lqthreshold = 10,
               hingethreshold = 15,
               beta_threshold = -1,
               beta_categorical = -1,
               beta_lqp = -1,
               beta_hinge = -1,
               betamultiplier = 1,
               defaultprevalence = 0.5),

 MAXNET = list( myFormula = NULL,
     regmult = 1,
     regfun = <function> ),

 XGBOOST = list( max.depth = 5,
                 eta = 0.1,
                 nrounds = 512,
                 objective = binary:logistic,
                 nthread = 1 )
)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> # Run the biomod2 models that you have chosen on the data provided.
> myBiomodModelOut <-BIOMOD_Modeling(
+   bm.format = myBiomodData,
+   modeling.id =paste(myRespName, "Species1", sep=""),
+   models =c("RF"),
+   bm.options = myBiomodOptions,
+   CV.strategy = "kfold",
+   CV.nb.rep = 1,
+   CV.k = 10,
+   CV.perc = 70,
+   CV.do.full.models =FALSE,
+   prevalence = 0.5,
+   metric.eval = c('TSS', 'ROC'),
+   var.import = 10,
+   seed.val = 42,
+   nb.cpu = 1,
+   do.progress = TRUE)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Checking Models arguments...

    > Automatic weights creation to rise a 0.5 prevalence
Creating suitable Workdir...

Checking Cross-Validation arguments...

   > k-fold cross-validation selection

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Empoasca.onukii Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

 36  environmental variables ( bio1 bio2 bio3 bio4 bio5 bio6 bio7 bio8 bio9 bio10 bio11 bio12 bio13 bio14 bio15 bio16 bio17 bio18 bio19 elev soil_pH EvergDecid EvergBroad DecidBroad MixedTree Shrubs HerbVeg CultVeg FloodVeg Urbab Snow Barren OpenWater Mou_300dens riverdens slope )
Number of evaluation repetitions : 10
Models selected : RF 

Total number of model runs: 10 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN1_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN1_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN2_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN2_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN3_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN3_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN4_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN4_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN5_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN5_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN6_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN6_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN7_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN7_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN8_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN8_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN9_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN9_RF failed!

-=-=-=--=-=-=- Empoasca.onukii_allData_RUN10_RF 

Model=Breiman and Cutler's random forests for classification and regression
    > RF modeling...Error in predict.randomForest(get_formal_model(object), as.data.frame(newdata[not_na_rows,  : 
  'prob' or 'vote' not meaningful for regression
In addition: Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

*** inherits(g.pred,'try-error')
   ! Note :  Empoasca.onukii_allData_RUN10_RF failed!

! All models failed
Error in fetch(key) : 
  lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
Error in fetch(key) : 
  lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
Error in fetch(key) : 
  lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
Error in fetch(key) : 
  lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt
> 

Bests,

Jinyu

rpatin commented 1 year ago

Dear Jinyu, Thank you for the update and the referenced information :pray: I updated the code for the predict function to be able to handle RF with either do.classif = TRUE or do.classif = FALSE. However please note that you likely have limited interest in using do.classif = FALSE but that is up to you. If you update to current github version with devtools::install_github('biomodhub/biomod2'), this should hopefully work. But please let us know if you encounter further problems.

Additionally, I see that you had the following error: lazy-load database 'C:/Program Files/R/R-4.1.0/library/biomod2/help/biomod2.rdb' is corrupt that should be solved when you restart R, e.g. see https://github.com/biomodhub/biomod2/issues/319#issuecomment-1683991433

Best, Rémi

Jinyu8579 commented 1 year ago

Dear Rémi,

Thanks for your quick feedback. We have updated to current github version with devtools::install_github('biomodhub/biomod2'), and it has been working well these days.

We initially wanted to treat the binary (1-presence/0-background) data as a continuous response variable in order to end up with a continuous measure of suitability. Thus, we planned to run a regression model, and set "do.classif = FALSE" for the parameter of the RF model when running the function "BIOMOD_ModelingOptions".

We are now facing the roadblock of overfitting, and try to solve the problem by adjusting the nodesize and maxnodes parameters of randomForest, as you suggested at #304 #261 #247 .

Thanks again for your help.

Best,

Jinyu