HenrikBengtsson / future.callr

:rocket: R package future.callr: A Future API for Parallel Processing using 'callr'
https://future.callr.futureverse.org
62 stars 1 forks source link

Issues writing raster files to disk #6

Closed pat-s closed 4 years ago

pat-s commented 5 years ago

In my analysis I write raster files in parallel using future_iwalk(writeRaster()) in combination with plan(future.callr::callr, workers = X).

When it comes to the writing step, I face the "error reading from connection" error. Using plan(future::multisession) however works.

Is this a known shortcoming? I can try to prepare a reprex if the case is worth to be investigated.

HenrikBengtsson commented 5 years ago

Hi, you're the first reporting on this. So, yes, please provide a minimal reproducible example and we take it from there.

pat-s commented 5 years ago

Hm, my local reprex works

library(raster)
library(future.callr)
library(purrr)
library(glue)
remotes::install_dev("furrr")
library(furrr)

rasters = list(ras1 = raster(system.file("external/test.grd", package="raster")),
  ras2 = raster(system.file("external/test.grd", package="raster"))
)

plan(future.callr::callr, workers = 2)

future_iwalk(rasters, ~ writeRaster(.x, glue("~/test-{.y}")))

Trying to see what's different on the specific machine.

pat-s commented 5 years ago

Works now also on the other machine. False positive issue.

HenrikBengtsson commented 5 years ago

Thanks for your followup

pat-s commented 5 years ago

Now I hit the error again. Didn't change the code. Tried it afterwards with multisession and it worked again. Strange.

HenrikBengtsson commented 5 years ago

It might be that those raster objects are non-exportable. See Section 'Non-exportable objects' in vignette 'A Future for R: Common Issues with Solutions' (https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html). If you run with:

options(future.globals.onReference = "error")

should get an informative error message if they're (possibly) non-exportable.

HenrikBengtsson commented 5 years ago

OTH, if it works with multisession and objects are non-exportable then it should fail there as well. Maybe you're hitting race condition issues where multiple processes try to write to the same file?

HenrikBengtsson commented 5 years ago

Also, see if you can reproduce the problem with future_lapply or similar to outrule a mistake in furrr.

But, sure a bit odd if it only happens occasionally.

HenrikBengtsson commented 5 years ago

I'll see if I can reproduce this later; I see you're installing the GitHub version of furrr (remotes::install_dev("furrr")). Is that necessary for reproducing the error, or do you see it also with the CRAN version?

Also, what's the sessionInfo() on the machine(s) where it fails and where it doesn't fail? That's useful clues.

pat-s commented 5 years ago

I see you're installing the GitHub version of furrr (remotes::install_dev("furrr")). Is that necessary for reproducing the error, or do you see it also with the CRAN version?

Required as I use future_iwalk() for writing and this is only avail in the dev so far.

Maybe you're hitting race condition issues where multiple processes try to write to the same file?

Different files are written so there should be no race condition. As I walk() over the names of the rasters, I see if one is missing in the end / is written twice.

Also, see if you can reproduce the problem with future_lapply or similar to outrule a mistake in furrr.

That's an option yes. However I cannot tell you when I will find time to take a deeper look again, I am quite busy right now. And since the multisession "workaround" works, the issue is not so urgent atm.

HenrikBengtsson commented 5 years ago

[...] As I walk() over the names of the rasters, I see if one is missing in the end / is written twice.

I don't understand this part.

Are you on Linux, macOS, or Windows?

pat-s commented 5 years ago

When using purrr::iwalk(), the function uses the names of the list elements for iterator .y. These names are then passed to the raster name. The names are unique and so the files written to disk are - hence I think there is no race condition occurring :)

Linux, centOS.

HenrikBengtsson commented 5 years ago

The following should do the same without furrr and glue. Please see if that also produces the problem for you. If it does, please share your sessionInfo().

library(raster)
library(future.apply)
plan(future.callr::callr, workers = 2L)

rasters <- list(
  ras1 = raster(system.file("external/test.grd", package="raster")),
  ras2 = raster(system.file("external/test.grd", package="raster"))
)

y <- future_lapply(seq_along(rasters), FUN = function(ii) {
  writeRaster(rasters[[ii]], filename = paste0("~/test-", ii))
})
pat-s commented 5 years ago

In your example the names the list names are not taken but only the indices of the list elements. The following is what works for me:

library(raster)
library(future.apply)
plan(future.callr::callr, workers = 2L)

rasters <- list(
  ras1 = raster(system.file("external/test.grd", package="raster")),
  ras2 = raster(system.file("external/test.grd", package="raster"))
)
names(rasters) = c("test1", "test2")

y <- future_lapply(seq_along(rasters), FUN = function(ii) {
  writeRaster(rasters[[ii]], filename = paste0("~/test-", names(rasters)[[ii]]))
})

While the example works, on the HPC I get the following error:

callr failed, could not start R, exited with non-zero status, has crashed or was killed

With multisession I get:

Failed to retrieve the value of MultisessionFuture (<none>) from cluster SOCKnode #4 (PID 125319 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive.

Hm, seems something is not setup correctly. It starts and runs for some time (I also see the processes). However, then it crashes.

HenrikBengtsson commented 5 years ago

In your example the names the list names are not taken but only the indices of the list elements.

Correct, but I'd be surprised if the issue with "error reading from connection" error. would be related to that. I wanted to identify a minimal reproducible example. (Actually, next step would be to get rid of future.apply as well to rule out that and reproduce the error using on the future + future.callr packages.)

While the example works, on the HPC I get the following error:

callr failed, could not start R, exited with non-zero status, has crashed or was killed

Are you saying you only get that on your HPC system (CentOS?) but not elsewhere (your local computer?)

With multisession I get:

Failed to retrieve the value of MultisessionFuture () from cluster SOCKnode #4 (PID 125319 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive.

Does this mean I should ignore your previous claim that "the multisession "workaround" works"? This is important information because originally it sounded it was specific to [future.]callr, whereas now it sounds like the issue might be elsewhere.

sessionInfo() is critical for troubleshooting, so please please share that.

pat-s commented 5 years ago

It is complicated. I try to make it more clear.

The above reprex

However, I have some code in an analysis that looks as follows. I know it still contains the glue() part but this should not be important.

So what happens is

So because the raster files are written to disk I assume that the code is correct and the file system access is ok. I have no clue why the error occurs after "gathering" the results (at least that is what I think).

plan(future.callr::callr, workers = 10)
  y <- future_lapply(seq_along(hyperspecs), FUN = function(ii)
    nbi_raster(hyperspecs[[ii]],
               filename =
                 str_replace(glue("data/hyperspectral/nri/nri-{names(hyperspecs)[[ii]]}"), ".tif", ".grd"),
               bnames_prefix = "NRI"))

I am right now running it sequentially to see whether the code if completely fine. However, I had this function working already previously on another machine where everything was fine. I just try to get it working on a new HPC machine with Slurm.

HenrikBengtsson commented 5 years ago

sessionInfo()?

pat-s commented 5 years ago
SessionInfo

``` ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.5.2 (2018-12-20) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui RStudio language (EN) collate en_GB.UTF-8 ctype en_GB.UTF-8 tz Europe/Berlin date 2019-03-02 ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ! package * version date lib source assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.2) backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2) base64url 1.4 2018-05-14 [1] CRAN (R 3.5.2) BBmisc 1.11 2019-01-09 [1] Github (berndbischl/BBmisc@a5a4e45) BiocGenerics 0.28.0 2018-10-30 [1] Bioconductor callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.2) caret * 6.0-81 2018-11-20 [1] CRAN (R 3.5.2) checkmate 1.9.1 2019-01-15 [1] CRAN (R 3.5.2) class 7.3-15 2019-01-01 [1] CRAN (R 3.5.2) classInt 0.3-1 2018-12-18 [1] CRAN (R 3.5.2) cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.2) CodeDepends 0.6.5 2018-07-17 [1] CRAN (R 3.5.2) codetools 0.2-16 2018-12-24 [1] CRAN (R 3.5.2) colorspace 1.4-0 2019-01-13 [1] CRAN (R 3.5.2) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.2) curl * 3.3 2019-01-10 [1] CRAN (R 3.5.2) data.table * 1.12.0 2019-01-13 [1] CRAN (R 3.5.2) DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.2) digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.2) dplyr * 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2) drake * 6.2.1.9001 2019-01-16 [1] Github (ropensci/drake@52590dd) e1071 1.7-0.1 2019-01-21 [1] CRAN (R 3.5.2) fastmatch 1.1-0 2017-01-28 [1] CRAN (R 3.5.2) foreach 1.4.4 2017-12-12 [1] CRAN (R 3.5.2) fs * 1.2.6 2018-08-23 [1] CRAN (R 3.5.2) furrr * 0.1.0.9002 2019-01-14 [1] Github (DavisVaughan/furrr@b4ad6ad) future * 1.11.1.1 2019-01-26 [1] CRAN (R 3.5.2) future.apply * 1.1.0 2019-01-17 [1] CRAN (R 3.5.2) future.callr * 0.4.0 2019-01-07 [1] CRAN (R 3.5.2) generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.2) ggplot2 * 3.1.0 2018-10-25 [1] CRAN (R 3.5.2) git2r 0.24.0 2019-01-07 [1] CRAN (R 3.5.2) globals 0.12.4 2018-10-11 [1] CRAN (R 3.5.2) glue * 1.3.0 2018-07-17 [1] CRAN (R 3.5.2) gower 0.1.2 2017-02-23 [1] CRAN (R 3.5.2) graph 1.60.0 2018-10-30 [1] Bioconductor gtable 0.2.0 2016-02-26 [1] CRAN (R 3.5.2) hsdar * 0.5.2 2019-02-05 [1] Github (pat-s/hsdar@9de91c8) igraph 1.2.4 2019-02-13 [1] CRAN (R 3.5.2) ipred 0.9-8 2018-11-05 [1] CRAN (R 3.5.2) iterators 1.0.10 2018-07-13 [1] CRAN (R 3.5.2) P lattice * 0.20-38 2018-11-04 [?] CRAN (R 3.5.2) lava 1.6.5 2019-02-12 [1] CRAN (R 3.5.2) lazyeval 0.2.1 2017-10-29 [1] CRAN (R 3.5.2) listenv 0.7.0 2018-01-21 [1] CRAN (R 3.5.2) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.2) magrittr * 1.5 2014-11-22 [1] CRAN (R 3.5.2) P MASS 7.3-51.1 2018-11-01 [?] CRAN (R 3.5.2) Matrix 1.2-15 2018-11-01 [1] CRAN (R 3.5.2) mlr * 2.13.9000 2019-02-26 [1] Github (mlr-org/mlr@261593e) mlrCPO * 0.3.4-2 2019-01-10 [1] CRAN (R 3.5.2) ModelMetrics 1.2.2 2018-11-03 [1] CRAN (R 3.5.2) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.2) P nlme 3.1-137 2018-04-07 [?] CRAN (R 3.5.2) nnet 7.3-12 2016-02-02 [1] CRAN (R 3.5.2) packrat 0.5.0 2018-11-14 [1] CRAN (R 3.5.2) parallelMap 1.3 2015-06-10 [1] CRAN (R 3.5.2) ParamHelpers * 1.12 2019-01-18 [1] CRAN (R 3.5.2) pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2) pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.2) plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.2) processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.2) prodlim 2018.04.18 2018-04-18 [1] CRAN (R 3.5.2) ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2) purrr * 0.3.0 2019-01-27 [1] CRAN (R 3.5.2) R.methodsS3 * 1.7.1 2016-02-16 [1] CRAN (R 3.5.2) R.oo * 1.22.0 2018-04-22 [1] CRAN (R 3.5.2) R.utils * 2.8.0 2019-02-14 [1] CRAN (R 3.5.2) R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2) raster * 2.8-19 2019-01-30 [1] CRAN (R 3.5.2) Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.2) recipes 0.1.4 2018-11-19 [1] CRAN (R 3.5.2) reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.2) rgdal * 1.3-9 2019-02-21 [1] CRAN (R 3.5.2) rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2) rootSolve * 1.7 2016-12-06 [1] CRAN (R 3.5.2) rpart 4.1-13 2018-02-23 [1] CRAN (R 3.5.2) rstudioapi 0.9.0 2019-01-09 [1] CRAN (R 3.5.2) scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.2) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.2) sf * 0.7-3 2019-02-21 [1] CRAN (R 3.5.2) signal * 0.7-6 2015-07-30 [1] CRAN (R 3.5.2) sp * 1.3-1 2018-06-05 [1] CRAN (R 3.5.2) storr 1.2.1 2018-10-18 [1] CRAN (R 3.5.2) stringi 1.3.1 2019-02-13 [1] CRAN (R 3.5.2) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.5.2) survival 2.43-3 2018-11-26 [1] CRAN (R 3.5.2) tibble 2.0.1 2019-01-12 [1] CRAN (R 3.5.2) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.2) timeDate 3043.102 2018-02-21 [1] CRAN (R 3.5.2) units 0.6-2 2018-12-05 [1] CRAN (R 3.5.2) withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.2) XML 3.98-1.17 2019-02-08 [1] CRAN (R 3.5.2) ```

pat-s commented 5 years ago

It passed when just using lapply(). So the code per se is fine.

HenrikBengtsson commented 4 years ago

after all files have been written, I get the errors [edit: "error reading from connection"] above ...

Did you ever figure something out regarding this? Can I close? I'm not sure I can give any useful feedback without a reproducible example. It sounds like you need to figure out what part tries to read from the files you've just written. Maybe those files are not properly flushed/closed before reading?

Probably unrelated, but when you use hyperspecs[[ii]] (or in my rewrite hyperspecs[[name]]) you end up subsetting hyperspecs within the future iteration. This means that all of hyperspecs will be exported to the worker before it is subsetted. I doubt it's related to your problem, but it wastes lots of your RAM if it's a large object. Better would be subset before the function call, e.g.

y <- future_mapply(hyperspecs, names(hyperspecs), FUN = function(hs, name) {
  filename <- paste0("data/hyperspectral/nri/nri-", sub(name, ".tif", ".grd"))
  nbi_raster(hs, filename = filename, bnames_prefix = "NRI")
})
pat-s commented 4 years ago

I guess we can close, this is currently not in my scope and I cannot devote time to it.

I would need to rerun it and provide a reprex since things have changed in {future} and {raster} since then.

Also this is a niche case and putting that much time in might not be worth. In addition no one else seems to have faced similar probs since then. Next time I come across such a problem I'll try do come up with a reprex right from the start and revisit here :)

Thanks for the hint regarding RAM! This might indeed have an influence here since RAM usage is usually quite high when dealing with rasters in general, so lowering it with some preprocessing is always very welcome.