PredictiveEcology / Biomass_speciesParameters

Other
1 stars 4 forks source link

possible race condition writing disk.frame objects #49

Closed achubaty closed 2 months ago

achubaty commented 2 months ago

While running concurrent replicates, I occasionally see the following error:

[ENOTEMPTY] Failed to remove '<projectPath>/inputs/cohortDataFactorial55440/.metadata': directory not empty

Since useDiskFrame() is simply using the number of rows in cohortData/speciesTable when writing to disk, this may be leading to a race condition where one run tries to cleanup while the directory is still in use by another.

useDiskFrame <- function(sim) {
  setup_disk.frame(workers = 2) ## TODO: is there a better default? should this be user-specified?

  cdRows <- nrow(sim$cohortDataFactorial)
  ## the rows of a factorial object will determine whether it is unique in 99.9% of cases
  sim$cohortDataFactorial <- as.disk.frame(sim$cohortDataFactorial, overwrite = TRUE,
                                           outdir = file.path(inputPath(sim),
                                                              paste0("cohortDataFactorial", cdRows)))
  stRows <- nrow(sim$speciesTableFactorial)
  sim$speciesTableFactorial <- as.disk.frame(sim$speciesTableFactorial, overwrite = TRUE,
                                             outdir = file.path(inputPath(sim),
                                                                paste0("speciesTableFactorial", stRows)))
  ## NOTE: disk.frame objects can be converted to data.table with as.data.table
  gc(reset = TRUE)
  return(sim)
}

@ianmseddy any thoughts on how to mitigate this?

achubaty commented 2 months ago

If these directories are intended to be transient (and they are being cleaned up by something), then can we simply use a random string instead of the row-count?

(But why are they considered transient at all? Surely we are saving them so they can be inspected later?)

achubaty commented 2 months ago

these are now written to outputPath(sim), which avoids the race condition, but if the module is replicated will create multiple similar objects on disk. generally, these should not be replicated.