biomodhub / biomod2

BIOMOD is a computer platform for ensemble forecasting of species distributions, enabling the treatment of a range of methodological uncertainties in models and the examination of species-environment relationships.
85 stars 22 forks source link

Error in BIOMOD_Projecting - [Error "[writeStart] file exists] #271

Closed MarianMirea closed 1 year ago

MarianMirea commented 1 year ago

Error: I am using biomod2 with 100 repetitions and a set of 10 PA sets. However, when the script is prepared for projection, I encounter this error:

"> Projecting species_allData_allRun_MAXNET ...Error in { :
task 2397 failed - "[writeStart] file exists. You can use 'overwrite=TRUE' to overwrite it" myBiomodProj <- BIOMOD_Projection( "

Within the directory "species -> models -> species FirstModeling," i have all the models. However, in the "species -> proj" folder, I will only have the ClampingMask.tif file.

The code for projection is structured like this:

myBiomodProj <- BIOMOD_Projection( bm.mod = myBiomodModelOut, new.env = env_current, proj.name = 'current', models.chosen = 'all', metric.binary = 'TSS', compress = TRUE, clamping.mask = TRUE, output.format = '.tif', seed.val = 13, overwrite = TRUE) myCurrentProj <- get_predictions(myBiomodProj, overwrite = TRUE)

If i run the code with only 1 repetition it will work whiteout a problem (The pseudo absence sets is not the problem in this context). I found on a forum that the output.format was the problem when was set on .grd, thus changing it to .tif, but that didn't solve the problem.

rpatin commented 1 year ago

Hi @MarianMirea, Thank you for reporting :pray: Could you share a bit more additional information ?

Best, Rémi

MarianMirea commented 1 year ago

Thank you @rpatin for answering and sorry for not providing all the information needed. I am using biomod2 to model the SDM of a species using 6 worldclim bioclimatic variables, using two IPCC emission scenarios, two GCMs and three-time horizons. I also use 'GLM','GAM','GBM','RF','MAXNET, not only Maxnet.

The problem I encounter is when I set the repetition from “BIOMOD_Modeling” to 100. The script runs till “BIOMOD_Projection”, where I encounter the error.

Error in { :  
  task 2901 failed - "[*] file exists. You can use 'overwrite=TRUE' to overwrite it

I tested with multiple variants of repetitions, because the script runs well with only 10 repetitions for example, but when i change it to 100 it gives me this error. The script runs for several hours so i didn't had the time to test more, since it runs trough all the projection and only when the projection are done it gives me the error, but no projection files.

p_abs <- nrow(myRespXY)
p_abs <- p_abs * 3
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = env_current,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName,
                                     PA.strategy = "random",
                                     PA.nb.rep = 10,             
                                     PA.nb.absences = p_abs)
myBiomodOption <- BIOMOD_ModelingOptions()

myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
                                    models = c('GLM','GAM','GBM','RF','MAXNET'),
                                    bm.options = myBiomodOption,
                                    nb.rep = 100,                                               
                                    data.split.perc = 80,
                                    prevalence= 0.5,
                                    metric.eval = c('TSS', 'ROC', 'KAPPA'),
                                    var.import = 10,
                                    do.full.models = FALSE,
                                    # save.output = TRUE,
                                    nb.cpu = 1,
                                    seed.val = 13,                                             
                                    modeling.id = paste(myRespName,"FirstModeling",sep=""))

myBiomodEM <- BIOMOD_EnsembleModeling(
  bm.mod = myBiomodModelOut,
  models.chosen = 'all',
  em.by='all',
  metric.select = c('TSS'),
  metric.select.thresh = c(0.4)
  metric.eval = c('TSS', 'ROC','KAPPA'),
  var.import = 10,
  nb.cpu = 1,
  seed.val = 13,
  do.progress = TRUE,
  em.algo = c('EMmean', 'EMwmean')
)
myBiomodProj <- BIOMOD_Projection(
  bm.mod = myBiomodModelOut,
  new.env = env_current,
  proj.name = 'current',
  models.chosen = 'all',
  metric.binary = 'TSS',
  compress = TRUE,
  clamping.mask = TRUE,
  output.format = '.tif',
  seed.val = 13)

After this part (BIOMOD_Projection), the script stops with the Error:

> Projecting species_PA10_RUN100_MAXNET ... 
> Projecting species_PA10_allRun_GLM ... 
> Projecting species_PA10_allRun_GAM ... 
> Projecting species_PA10_allRun_GBM ... 
> Projecting species_PA10_allRun_RF ... 
> Projecting species_PA10_allRun_MAXNET ... 
> Projecting species_allData_allRun_GLM ... 
> Projecting species_allData_allRun_GAM ... 
> Projecting species_allData_allRun_GBM ... 
> Projecting species_allData_allRun_RF ... 
> Projecting species_allData_allRun_MAXNET ...Error in { :  
  task 2901 failed - "[*] file exists. You can use 'overwrite=TRUE' to overwrite it" 

Using show(myBiomodModelOut) all the models are present, also in the folder, but in proj where usually i find the projection only the proj_ClaimpingMask is present.

Modeling folder : . 
Species modeled : species 
Modeling id : speciesFirstModeling 
Considered variables : bio_15 bio_18 bio_19 bio_3 bio_7 bio_9 

Computed Models :  species_PA1_RUN1_GLM species_PA1_RUN1_GAM species_PA1_RUN1_GBM species_PA1_RUN1_RF species_PA1_RUN1_MAXNET  
.................................
species_PA10_allRun_MAXNET species_allData_allRun_GLM species_allData_allRun_GAM species_allData_allRun_GBM species_allData_allRun_RF  species_allData_allRun_MAXNET 

This is my session info.

> sessionInfo() 

R version 4.3.0 (2023-04-21 ucrt) 
Platform: x86_64-w64-mingw32/x64 (64-bit) 
Running under: Windows 11 x64 (build 22621) 
Matrix products: default 
locale: 
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8 LC_NUMERIC=C    LC_TIME=English_United States.utf8     
time zone: Europe 
tzcode source: internal 
attached base packages: 
[1] stats     graphics  grDevices utils     datasets  methods   base      
other attached packages: 
[1] gbm_2.1.8.1      beepr_1.3        rasterVis_0.51.5 lattice_0.21-8   rgdal_1.6-6      dplyr_1.1.2      raster_3.6-20    sp_1.6-1         biomod2_4.2-3    
loaded via a namespace (and not attached): 

 [1] shape_1.4.6            gtable_0.3.3           ggplot2_3.4.2          maxnet_0.1.4           latticeExtra_0.6-30    vctrs_0.6.2            tools_4.3.0            generics_0.1.3         parallel_4.3.0         
[10] tibble_3.2.1           PresenceAbsence_1.1.11 fansi_1.0.4            pkgconfig_2.0.3        Matrix_1.5-4           RColorBrewer_1.1-3     lifecycle_1.0.3        compiler_4.3.0         stringr_1.5.0          
[19] deldir_1.0-6           munsell_0.5.0          terra_1.7-29           codetools_0.2-19       class_7.3-21           glmnet_4.1-7           Formula_1.2-5          pillar_1.9.0           hexbin_1.28.3          
[28] MASS_7.3-58.4          iterators_1.0.14       plotmo_3.6.2           rpart_4.1.19           earth_5.3.2            abind_1.4-5            foreach_1.5.2          nlme_3.1-162           mda_0.5-3              
[37] TeachingDemos_2.12     tidyselect_1.2.0       stringi_1.7.12         reshape2_1.4.4         splines_4.3.0          grid_4.3.0             colorspace_2.1-0       cli_3.6.1              magrittr_2.0.3         
[46] randomForest_4.7-1.1   survival_3.5-5         utf8_1.2.3             withr_2.5.0            scales_1.2.1           plotrix_3.8-2          audio_0.1-10           jpeg_0.1-10            interp_1.1-4           
[55] nnet_7.3-18            zoo_1.8-12             png_0.1-8              viridisLite_0.4.2      mgcv_1.8-42            rlang_1.1.1            Rcpp_1.0.10            glue_1.6.2             pROC_1.18.2            
[64] reshape_0.8.9          R6_2.5.1               plyr_1.8.8     

Thank you!

rpatin commented 1 year ago

Hi @MarianMirea, Thank you for the additional information :pray: It is not an obvious error and it is quite puzzling that it occurs only for 100 repetition and not for a lower number. Also the message in itself might be misleading and caused by another problem (e.g. a memory issue) I did try to estimate numerous models for a simplified Gulo gulo example but I failed to reproduce the error yet. Out of curiosity, did you censor part of the path in the error message?

Error in { :  
  task 2901 failed - "[*] file exists. You can use 'overwrite=TRUE' to overwrite it

Given that it is not private, if you could share the part of the path ([*]) that are linked to the project this may help us to identify the issue.

A few hints and things to try:

If none of that helps, you can also send your data so that we may try to reproduce the issue. Here is my mail: remi.patin@univ-grenoble-alpes.fr (although note that we are quite busy next week so we may be less reactive to solve the issue).

Best regards, Rémi

MarianMirea commented 1 year ago

Hi @rpatin Thank you again for your time! As you can see in this thread in the first post there is "writeStart" between the []. The second post was made after trying again to run the script overnight, and I didn't even notice that now there is an * in place of writeStart.

> Projecting species_allData_allRun_MAXNET ...Error in { :  

  task 2397 failed - "[writeStart] file exists. You can use 'overwrite=TRUE' to overwrite it" 

To answer the question i didn't censored the error. That's everything in the console, before the error is a long list of projection, just as follow (this is the error the second time).

> Projecting species_allData_allRun_RF ... 
> Projecting species_allData_allRun_MAXNET ...Error in { :  
  task 2901 failed - "[*] file exists. You can use 'overwrite=TRUE' to overwrite it" 

The error only occurs after all the projections are done, which is several hours ( 10PA 100 repetitions 5 algo) I also thought there is a problem with the memory so i lowered the iteration to 30 and i will keep you updated. Thank you again for the explanations and quick response! Best regards, Marian.

MarianMirea commented 1 year ago

Hi everyone,

Thank you once again for providing me with information and support. I'm back to share the results. I attempted two approaches, and fortunately, they resolved the error, but i don't know which one. Firstly, I manually loaded the terra package because I noticed that it was not loaded initially, even though biomod2 usually imports terra automatically. Secondly, I implemented the suggestion made by @rpatin, which involved changing do.stack = FALSE. As a result, each projection will now be written as a TIFF file. I believe the issue was due to insufficient memory, likely due to the large number of projection files I have, approximately 5000 in total.

I have a final questions. Currently, the parallelize setting does not work on Windows operating systems, as indicated by the error message: "Parallelization withforeachis not available for Windows. Sorry." However, this function does work on Linux, and i wanted to know whether the current state of the projection function **BIOMOD_Projection** allows it to take advantage of this functionality.

Thank you! Best regards, Marian.

MayaGueguen commented 1 year ago

Hello Marian,

Yes, sorry, parallelization does not work on Windows. But you can indeed use it if working on MAC / Linux OS. You just have to set the nb.cpu parameter of the BIOMOD_Projection function to the required number of cores upon which you would like to divide your calculations (eg : nb.cpu = 4 means that 4 projections will be run at the same time). Note that the nb.cpu parameter is also available for BIOMOD_Modeling, BIOMOD_EnsembleModeling and BIOMOD_EnsembleForecasting functions.

Maya