biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
77 stars 21 forks source link

Error in BIOMOD_Modeling: Error in { : task 27 failed - "cannot open the connection" #448

Closed LorenzoBernicchi closed 3 weeks ago

LorenzoBernicchi commented 2 months ago

Error and context

Hello everyone. I am modeling the distribution of cervus elaphus on a global scale, using the bioclimatic variables I downloaded from WorldClim. I prepared all my data: checking for multicollinearity among variables, thinning my occurrences and so on. When I start with biomod modeling steps, everytime I face the same error:

Error in { : task 27 failed - "cannot open the connection"

In addition: there are more than 50 warnings 1: In .bm_ModelingOptions.check.args(data.type = data.type, ... : Only one GAM model can be activated. 'GAM.mgcv.gam' has been set (other available : 'GAM.gam.gam' or 'GAM.mgcv.bam') 2: In bm_RunModelsLoop(bm.format = bm.format, weights = weights, ... : Parallelisation with foreach is not available for Windows. Sorry. 3: executing %dopar% sequentially: no parallel backend registered 36: In file(con, "r") : non è possibile aprire il file './Cervus.global.nuovo.CA/models/Single.models/Cervus.global.nuovo.CA_PA1_RUN4_MAXENT_outputs/maxent.stderr': Permission denied 37: In predict.gbm(get_formal_model(object), as.data.frame(newdata[not_na_rows, ... : Number of trees not specified or exceeded number fit so far. Using 100. (I get a lot of this last type of error)

The number after the word "task" changes everytime, but I always get the same warning messages: the ones about GAM, MAXENT and a bunch of equal errors for the GBM.

Code used to get the error

Selected_algos <- c("CTA", "FDA", "GLM", "GBM", "GAM", "MAXENT", "MAXNET")
user.MAXENT <- list('_allData_allRun' = list(
  path_to_maxent.jar = "."
))
user.val <- list(MAXENT.binary.MAXENT.MAXENT = user.MAXENT)
myOptions <- bm_ModelingOptions(data.type = 'binary',
                                models = Selected_algos,
                                strategy = 'user.defined',
                                user.val = user.val,
                                user.base = 'bigboss')
myOptions
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.options -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

    >  CTA options (datatype: binary , package: rpart , function: rpart ) :
       ( dataset _allData_allRun )
        -  formula = 
        -  data = 
        -  weights = 
        -  subset = 
        -  na.action = na.rpart
        -  method = "class"
        -  model = FALSE
        -  x = FALSE
        -  y = TRUE
        -  parms = 
        -  control = $xval 5  $minbucket 5  $minsplit 5  $cp 0.001  $maxdepth 25    (default:  )
        -  cost = 

    >  FDA options (datatype: binary , package: mda , function: fda ) :
       ( dataset _allData_allRun )
        -  formula = formula(data)
        -  data = sys.frame(sys.parent())
        -  weights = 
        -  theta = 
        -  eps = .Machine$double.eps
        -  method = "mars"   (default: polyreg )

    >  GLM options (datatype: binary , package: stats , function: glm ) :
       ( dataset _allData_allRun )
        -  formula = 
        -  family =  Family: binomial  Link function: logit  
        -  data = 
        -  weights = 
        -  subset = 
        -  na.action = 
        -  etastart = 
        -  mustart = 0.5   (default:  )
        -  offset = 
        -  control = $epsilon 1e-08  $maxit 50  $trace FALSE    (default: list() )
        -  model = TRUE
        -  method = "glm.fit"
        -  x = FALSE
        -  y = TRUE
        -  singular.ok = TRUE

    >  GBM options (datatype: binary , package: gbm , function: gbm ) :
       ( dataset _allData_allRun )
        -  formula = formula(data)
        -  distribution = "bernoulli"
        -  data = list()
        -  weights = 
        -  n.trees = 2500   (default: 100 )
        -  interaction.depth = 7   (default: 1 )
        -  n.minobsinnode = 5   (default: 10 )
        -  shrinkage = 0.001   (default: 0.1 )
        -  bag.fraction = 0.5
        -  train.fraction = 1
        -  cv.folds = 3   (default: 0 )
        -  keep.data = FALSE   (default: TRUE )
        -  verbose = FALSE
        -  n.cores = 1   (default: NULL )

    >  GAM options (datatype: binary , package: mgcv , function: gam ) :
       ( dataset _allData_allRun )
        -  formula = 
        -  family =  Family: binomial  Link function: logit  
        -  data = list()
        -  na.action = 
        -  method = "GCV.Cp"
        -  optimizer = c("outer", "newton")
        -  control = $epsilon 1e-06  $trace FALSE  $maxit 100    (default: $nthreads 1  $ncv.threads 1  $irls.reg 0  $epsilon 1e-07  $maxit 200  $trace FALSE  $mgcv.tol 1e-07  $mgcv.half 15  $rank.tol 1.490116e-08  $nlm $nlm$ndigit 7  $nlm$gradtol 1e-06  $nlm$stepmax 2  $nlm$steptol 1e-04  $nlm$iterlim 200  $nlm$check.analyticals FALSE   $optim $optim$factr 1e+07   $newton $newton$conv.tol 1e-06  $newton$maxNstep 5  $newton$maxSstep 2  $newton$maxHalf 30  $newton$use.svd FALSE   $idLinksBases TRUE  $scalePenalty TRUE  $efs.lspmax 15  $efs.tol 0.1  $keepData FALSE  $scale.est "fletcher"  $edge.correct FALSE  )
        -  scale = 0
        -  select = FALSE
        -  gamma = 1
        -  fit = TRUE
        -  drop.unused.levels = TRUE
        -  discrete = FALSE

    >  MAXENT options (datatype: binary , package: MAXENT , function: MAXENT ) :
       ( dataset _allData_allRun )
        -  path_to_maxent.jar = "."   (default: "C:/Cervo_EcoRegioni_CA" )
        -  memory_allocated = 512
        -  background_data_dir = "default"
        -  visible = FALSE
        -  linear = TRUE
        -  quadratic = TRUE
        -  product = TRUE
        -  threshold = TRUE
        -  hinge = TRUE
        -  lq2lqptthreshold = 80
        -  l2lqthreshold = 10
        -  hingethreshold = 15
        -  beta_threshold = -1
        -  beta_categorical = -1
        -  beta_lqp = -1
        -  beta_hinge = -1
        -  betamultiplier = 1
        -  defaultprevalence = 0.5

    >  MAXNET options (datatype: binary , package: maxnet , function: maxnet ) :
       ( dataset _allData_allRun )
        -  p = 
        -  data = 
        -  regmult = 1
        -  regfun = maxnet.default.regularization
        -  addsamplestobackground = T

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
BIOMOD_data <- BIOMOD_FormatingData(
  resp.name = "Cervus_global_nuovo_CA",
  resp.var = Cervus_points,
  expl.var = Bioclim_Cervus,
  PA.nb.rep = 1, 
  PA.nb.absences = 20000,
  PA.strategy = 'random', 
  na.rm = T, filter.raster = T, 
)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.formated.data -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

dir.name =  .

sp.name =  Cervus.global.nuovo.CA

     1916 presences,  0 true absences and  20000 undefined points in dataset

     6 explanatory variables

     bio_7             bio_10          bio_11           bio_12           bio_14      
 Min.   :  5.533   Min.   :  0.0   Min.   :  14.0   Min.   : 2.925   Min.   : 297.3  
 1st Qu.: 25.023   1st Qu.:138.0   1st Qu.:  99.0   1st Qu.: 7.925   1st Qu.: 674.4  
 Median : 31.535   Median :195.0   Median : 118.0   Median : 8.658   Median : 827.7  
 Mean   : 33.607   Mean   :176.8   Mean   : 153.9   Mean   : 9.038   Mean   : 844.6  
 3rd Qu.: 36.660   3rd Qu.:222.0   3rd Qu.: 177.0   3rd Qu.: 9.950   3rd Qu.:1012.1  
 Max.   :118.236   Max.   :656.0   Max.   :1131.0   Max.   :15.867   Max.   :1407.4  
     bio_18      
 Min.   :-11.47  
 1st Qu.:  8.75  
 Median : 13.32  
 Mean   : 12.42  
 3rd Qu.: 16.60  
 Max.   : 23.35  

 1 Pseudo Absences dataset available ( PA1 ) with  20000 (PA1) pseudo absences

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
class       : SpatRaster 
dimensions  : 4958, 9451, 6  (nrow, ncol, nlyr)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -10.59167, 68.16667, 29.89167, 71.20833  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : spat_577058ad36f1_22384.tif 
names       :      bio_7, bio_10, bio_11,   bio_12,   bio_14,    bio_18 
min values  :   4.120157,      0,     13,  1.00000,    0.000, -16.91667 
max values  : 120.114082,    791,   1169, 16.10833, 1424.456,  24.41667 
Cervus_single_models_global <- BIOMOD_Modeling(
  bm.format = BIOMOD_data,
  modeling.id = "Single.models",
  models = Selected_algos,
  CV.strategy = "random",
  CV.nb.rep = 20,
  CV.perc = 0.7,
  CV.do.full.models = F,
  bm.options = myOptions,
  metric.eval = c("ROC", "TSS"),
  var.import = 1,
  nb.cpu = 4,
  do.progress = T
)

When the BIOMOD_Modeling function is running, on the screen I can see the progress bars as well as the different step of each algorithm. Regarding the MAXENT algorithm, this is what I can generally read on the console:

-=-=-=--=-=-=- Cervus.global.nuovo.CA_PA1_RUN18_MAXENT 

        Creating Maxent Temp Proj Data...
 Getting predictions...
 Getting predictor contributions...
  |=====================================================================================| 100%
               Evaluating Model stuff...
           Evaluating Predictor Contributions...

On the other end, for some CV.run I can read this:

-=-=-=--=-=-=- Cervus.global.nuovo.CA_PA1_RUN4_MAXENT 

        Creating Maxent Temp Proj Data...

Note how the CV.run was the same as in the warning message I wrote at the beginning, but that was not the only CV.run that behaved like this.

Environment Information

sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8   
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] splines   grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xgboost_1.7.7.1         randomForest_4.7-1.1    maxnet_0.1.4           
 [4] earth_5.3.3             plotmo_3.6.3            plotrix_3.8-4          
 [7] Formula_1.2-5           gbm_2.1.9               mgcv_1.9-1             
[10] nlme_3.1-164            gam_1.22-3              foreach_1.5.2          
[13] mda_0.5-4               class_7.3-22            rpart_4.1.23           
[16] nnet_7.3-19             biomod2_4.2-5           usdm_2.1-7             
[19] terra_1.7-71            biosurvey_0.1.2         spThin_0.2.0           
[22] knitr_1.46              fields_15.2             viridisLite_0.4.2      
[25] spam_2.10-0             CoordinateCleaner_3.0.1 data.table_1.15.4      
[28] rgbif_3.7.9            

loaded via a namespace (and not attached):
 [1] DBI_1.2.2              rgeos_0.6-4            deldir_2.0-4          
 [4] pROC_1.18.5            permute_0.9-7          rlang_1.1.3           
 [7] magrittr_2.0.3         e1071_1.7-14           compiler_4.3.3        
[10] spatstat.geom_3.2-9    vctrs_0.6.5            maps_3.4.2            
[13] reshape2_1.4.4         stringr_1.5.1          shape_1.4.6.1         
[16] pkgconfig_2.0.3        utf8_1.2.4             pracma_2.4.4          
[19] bit_4.0.5              glmnet_4.1-8           xfun_0.43             
[22] jsonlite_1.8.8         PresenceAbsence_1.1.11 reshape_0.8.9         
[25] spatstat.utils_3.0-4   parallel_4.3.3         cluster_2.1.6         
[28] R6_2.5.1               stringi_1.8.3          spatstat.data_3.0-4   
[31] diptest_0.77-1         Rcpp_1.0.12            iterators_1.0.14      
[34] picante_1.8.2          Matrix_1.6-5           tidyselect_1.2.1      
[37] rnaturalearth_1.0.1    rstudioapi_0.16.0      abind_1.4-5           
[40] vegan_2.6-4            doParallel_1.0.17      codetools_0.2-19      
[43] lattice_0.22-5         tibble_3.2.1           plyr_1.8.9            
[46] ks_1.14.2              geosphere_1.5-18       survival_3.5-8        
[49] sf_1.0-16              units_0.8-5            proxy_0.4-27          
[52] polyclip_1.10-6        xml2_1.3.6             mclust_6.1            
[55] pillar_1.9.0           whisker_0.4.1          KernSmooth_2.23-22    
[58] generics_0.1.3         sp_2.1-3               ggplot2_3.5.0         
[61] munsell_0.5.1          scales_1.3.0           rgdal_1.6-7           
[64] glue_1.7.0             lazyeval_0.2.2         tools_4.3.3           
[67] mvtnorm_1.2-4          dotCall64_1.1-1        ape_5.8               
[70] colorspace_2.1-0       raster_3.6-26          cli_3.6.2             
[73] fansi_1.0.6            dplyr_1.1.4            gtable_0.3.4          
[76] oai_0.4.0              digest_0.6.35          classInt_0.4-10       
[79] lifecycle_1.0.4        httr_1.4.7             bit64_4.0.5           
[82] MASS_7.3-60.0.1

Herefater, I also attach the list of my R packages and the corresponding versions. list_of_package_version.csv

I already read some issues that explained the same problem, and I already tried the already provided solutions. The directory I am working in it is the shortest possible: C:/Cervo_EcoRegioni_CA/. Moreover, I am using a pretty powerful computer: it is an ASUS laptop, with an AMD Ryzen 7 6800 HS processor and 16 GB of RAM, and an NVIDIA GeForce RTX 3050 Laptop GPU GDDR6 with 4 GB of RAM.

Could this problem be a matter of packages version that interfere one another? I really don't know why this issues continues to happen, but it's getting really frustrating unfortunately.

I thank you so much for all your effort you dedicate to biomod2: I really love to model with this package!! I wish you a good day, best regards!

HeleneBlt commented 2 months ago

Hi Lorenzo,

Thanks for all the details 🙏 I didn't think it was a path length issue but rather a problem with java. If you haven't done it for a long time, I'd advise you to update java and maxent. You'll also need to check if you have all the permissions on maxent.jar.

I admit it's strange that this doesn't happen for all runs with MAXENT. If the update isn't enough, I'll probably need your data to reproduce the error.

Have a good day, Hélène

LorenzoBernicchi commented 2 months ago

Hello @HeleneBlt, I just checked, and actually I had old version of both java and maxent. I updated them and I will try again as soon as I can. I will let you know, thanks for the moment!

EDIT I tried again (with Java 8 Update 411 and Maxet 3.4.4) and I still get the error, this time the task failed is the number 55. I will share my data with you with the following comment.

Have a nice day, Lorenzo Bernicchi

LorenzoBernicchi commented 2 months ago

Dear @HeleneBlt Here I attach a compressed folder where you can find all the data you could need to reproduce the error: the file excel with presence data, the R script I am running and two .shp files of the modeling background areas. In the folder there are not the environmental variables I use, since it would be too heavy (like 10 GB). However, I use the climatic variables that I downloaded from the WorldClim website (https://www.worldclim.org/data/worldclim21.html).

Here you can download the folder: Cervo_nuovo_CA.zip

Let me know is you can reproduce the error, in these days I will try with another computer, much more powerful. Maybe it's a memory issue, I really don't know.

Thanks again, have a nice day! Lorenzo Bernicchi

HeleneBlt commented 2 months ago

Hi Lorenzo,

I'm sorry but I couldn't reproduce the error. It runs smoothly with my computer (and the data data("bioclim_current") of biomod2). In my case, the data are really lighter so you cannot exclude a memory issue. I can only advise you to try with another computer as you suggest, check the different updates and check the permissions. Sorry 😬

Hope you will find a solution! Hélène

LorenzoBernicchi commented 1 month ago

Hello @HeleneBlt

Thanks to let me know, that's not what I was hoping for unfortunately.

I will try on another PC with more RAM, I really hope that it will work.

Thanks for now, I really appreciated your support! Have a nice day!

Lorenzo Bernicchi

LorenzoBernicchi commented 1 month ago

Hello @HeleneBlt ,

I just tried removing the GBM from the algorithms I was using. And guess what? Now I don't get the error anymore! Maybe it was a problem related to package version used to run the GBM? Do you have any idea of the reason behind this strange error?

MayaGueguen commented 1 month ago

Hello Lorenzo :wave:

I just had word from Gafarou who sent me his data and scripts, as he was encoutering the same error as you :eyes:

Error in { : task 3 failed - "cannot open the connection"
In addition: Warning messages:
1: In .bm_ModelingOptions.check.args(data.type = data.type, models = models,  :
  Only one GAM model can be activated. 'GAM.mgcv.gam' has been set (other available : 'GAM.gam.gam' or 'GAM.mgcv.bam')
2: In bm_RunModelsLoop(bm.format = bm.format, weights = weights, calib.lines = calib.lines,  :
  Parallelisation with `foreach` is not available for Windows. Sorry.
3: In file(con, "r") :
  cannot open file './Adansonia.digitata/models/Single.models/Adansonia.digitata_PA1_RUN1_MAXENT_outputs/maxent.stderr': Permission denied
4: In file(con, "r") :
  cannot open file './Adansonia.digitata/models/Single.models/Adansonia.digitata_PA1_RUN2_MAXENT_outputs/maxent.stderr': Permission denied
5: In file(con, "r") :
  cannot open file './Adansonia.digitata/models/Single.models/Adansonia.digitata_PA2_RUN1_MAXENT_outputs/maxent.stderr': Permission denied
6: In file(con, "r") :
  cannot open file './Adansonia.digitata/models/Single.models/Adansonia.digitata_PA2_RUN2_MAXENT_outputs/maxent.stderr': Permission denied

He was using biomod2 4.2-5 and MAXENT 3.4.4, and I was actually able to run his script without getting any error on my computer :heavy_check_mark:
He was running 4 single models : GLM, GAM, MAXENT and RF (so no GBM).

:arrow_right: so unless I have the same configuration as Hélène (but she's working on Windows, and I'm on Ubuntu), maybe it is really coming from computer resources limit ?

:arrow_right: or as Hélène mentioned earlier, it might come from your java configuration and rights (https://github.com/biomodhub/biomod2/issues/291#issuecomment-1628578092) ?

Maya

LorenzoBernicchi commented 1 month ago

Hello @MayaGueguen ,

In these days I am trying a different model, using a much smaller area and, therefore, I used much less space. So I am pretty sure this is not an issue related to limited computer resources.

About javaconfiguration and rights: I am able to open and execute the maxent.jar, so I was pretty sure I correctly set all the permissions and rights, but it seems not since I still get the error Cannot open the file ./Capreolus.CA/models/Single.models/Capreolus.CA_PA1_RUN3_MAXENT_outputs/maxent.stderr': Permission denied. From the security tab in the property dialog box I checked all the options, so that should be good, right? May I ask you how I should configure java? Maybe I am missing some important stuff. It's getting really frustating.

A LITTLE UPDATE In the grip of frustration I decided to uninstall java and delete the maxent.jar file. The really first time I run the model everything worked properly, so I decided to do several tests as I originally intended. I noticed that sometimes everything worked fine, while other times (without changing any options except the number of CV.nb.rep) I got the usual error about maxent. Furthermore, I have noticed that the first time I open R and run the code I encounter the error almost every time. However, if I delete the R environment and the data created within the directory where I work and if I run the model again without closing R, everything works perfectly every time (I already did something like 50 trials, so a good number I guess). This situation is taking on somewhat mysterious aspects!!

Thanks for your help, I really appreciate it. Lorenzo Bernicchi

HeleneBlt commented 1 month ago

Hi Lorenzo !

Indeed it's a mystery 🤔 We face another issue with some hidden connections blocking MAXENT and other models under Windows specifically. By deleting the R environment, you close these hidden connections, so maybe it is related. Could you try, directly when you open R, to run:

 env <- foreach:::.foreachGlobals
 rm(list = ls(name = env), pos = env)

and after, run the model ?

Hélène

LorenzoBernicchi commented 1 month ago

Hello @MayaGueguen ,

I just added the code lines you wrote me in the previous comment. I paste them at the beginning of my code and run them as soon as I open R. Unfortunately, nothing changed: most of the time I run the script I get the error, I have to run it like 10/15 times only to get a one single model output. And since I would like to test several model parameters and other stuff, this is getting really time consuming.

I really don't know what to think aout the potential issue, but really thanks anyway for your help! Have a nice day, Lorenzo Bernicchi

LorenzoBernicchi commented 3 weeks ago

Dear @MayaGueguen ,

It's still me, unfortunately. I downloaded the development version of the package, the 4.2-5-2, but the error still occurs. Did you find any potential bug causing this issue, by any chance?

Thanks for all your help, I really appreciate it. Have a nice day, Lorenzo Bernicchi

MayaGueguen commented 3 weeks ago

Hello Lorenzo,

Could you check and try this issue please ? https://github.com/biomodhub/biomod2/issues/455#issuecomment-2127563951 Colin seems to have found a solution that works for both him and Chenyong, but I'm not 100% sure that they were in the same case than you :eyes:

Maya

LorenzoBernicchi commented 3 weeks ago

Hello @MayaGueguen , I was just about to write you again! Yesterday I found the issue you just mentioned, and it works!!! I am really happy, but I justt opened a new issue for another advice I need!

Thank you anyway, I really appreciate it!