Closed pat-s closed 4 years ago
Hi, you're the first reporting on this. So, yes, please provide a minimal reproducible example and we take it from there.
Hm, my local reprex works
library(raster)
library(future.callr)
library(purrr)
library(glue)
remotes::install_dev("furrr")
library(furrr)
rasters = list(ras1 = raster(system.file("external/test.grd", package="raster")),
ras2 = raster(system.file("external/test.grd", package="raster"))
)
plan(future.callr::callr, workers = 2)
future_iwalk(rasters, ~ writeRaster(.x, glue("~/test-{.y}")))
Trying to see what's different on the specific machine.
Works now also on the other machine. False positive issue.
Thanks for your followup
Now I hit the error again. Didn't change the code. Tried it afterwards with multisession
and it worked again. Strange.
It might be that those raster objects are non-exportable. See Section 'Non-exportable objects' in vignette 'A Future for R: Common Issues with Solutions' (https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html). If you run with:
options(future.globals.onReference = "error")
should get an informative error message if they're (possibly) non-exportable.
OTH, if it works with multisession and objects are non-exportable then it should fail there as well. Maybe you're hitting race condition issues where multiple processes try to write to the same file?
Also, see if you can reproduce the problem with future_lapply or similar to outrule a mistake in furrr.
But, sure a bit odd if it only happens occasionally.
I'll see if I can reproduce this later; I see you're installing the GitHub version of furrr (remotes::install_dev("furrr")
). Is that necessary for reproducing the error, or do you see it also with the CRAN version?
Also, what's the sessionInfo()
on the machine(s) where it fails and where it doesn't fail? That's useful clues.
I see you're installing the GitHub version of furrr (
remotes::install_dev("furrr")
). Is that necessary for reproducing the error, or do you see it also with the CRAN version?
Required as I use future_iwalk()
for writing and this is only avail in the dev so far.
Maybe you're hitting race condition issues where multiple processes try to write to the same file?
Different files are written so there should be no race condition. As I walk()
over the names of the rasters, I see if one is missing in the end / is written twice.
Also, see if you can reproduce the problem with future_lapply or similar to outrule a mistake in furrr.
That's an option yes. However I cannot tell you when I will find time to take a deeper look again, I am quite busy right now. And since the multisession
"workaround" works, the issue is not so urgent atm.
[...] As I walk() over the names of the rasters, I see if one is missing in the end / is written twice.
I don't understand this part.
Are you on Linux, macOS, or Windows?
When using purrr::iwalk()
, the function uses the names of the list elements for iterator .y
. These names are then passed to the raster name. The names are unique and so the files written to disk are - hence I think there is no race condition occurring :)
Linux, centOS.
The following should do the same without furrr and glue. Please see if that also produces the problem for you. If it does, please share your sessionInfo()
.
library(raster)
library(future.apply)
plan(future.callr::callr, workers = 2L)
rasters <- list(
ras1 = raster(system.file("external/test.grd", package="raster")),
ras2 = raster(system.file("external/test.grd", package="raster"))
)
y <- future_lapply(seq_along(rasters), FUN = function(ii) {
writeRaster(rasters[[ii]], filename = paste0("~/test-", ii))
})
In your example the names the list names are not taken but only the indices of the list elements. The following is what works for me:
library(raster)
library(future.apply)
plan(future.callr::callr, workers = 2L)
rasters <- list(
ras1 = raster(system.file("external/test.grd", package="raster")),
ras2 = raster(system.file("external/test.grd", package="raster"))
)
names(rasters) = c("test1", "test2")
y <- future_lapply(seq_along(rasters), FUN = function(ii) {
writeRaster(rasters[[ii]], filename = paste0("~/test-", names(rasters)[[ii]]))
})
While the example works, on the HPC I get the following error:
callr failed, could not start R, exited with non-zero status, has crashed or was killed
With multisession I get:
Failed to retrieve the value of MultisessionFuture (<none>) from cluster SOCKnode #4 (PID 125319 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive.
Hm, seems something is not setup correctly. It starts and runs for some time (I also see the processes). However, then it crashes.
In your example the names the list names are not taken but only the indices of the list elements.
Correct, but I'd be surprised if the issue with "error reading from connection" error.
would be related to that. I wanted to identify a minimal reproducible example. (Actually, next step would be to get rid of future.apply as well to rule out that and reproduce the error using on the future + future.callr packages.)
While the example works, on the HPC I get the following error:
callr failed, could not start R, exited with non-zero status, has crashed or was killed
Are you saying you only get that on your HPC system (CentOS?) but not elsewhere (your local computer?)
With multisession I get:
Failed to retrieve the value of MultisessionFuture (
) from cluster SOCKnode #4 (PID 125319 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive.
Does this mean I should ignore your previous claim that "the multisession "workaround" works"? This is important information because originally it sounded it was specific to [future.]callr, whereas now it sounds like the issue might be elsewhere.
sessionInfo()
is critical for troubleshooting, so please please share that.
It is complicated. I try to make it more clear.
The above reprex
future.callr:callr
and multisession
However, I have some code in an analysis that looks as follows. I know it still contains the glue()
part but this should not be important.
So what happens is
multisession
and callr
So because the raster files are written to disk I assume that the code is correct and the file system access is ok. I have no clue why the error occurs after "gathering" the results (at least that is what I think).
plan(future.callr::callr, workers = 10)
y <- future_lapply(seq_along(hyperspecs), FUN = function(ii)
nbi_raster(hyperspecs[[ii]],
filename =
str_replace(glue("data/hyperspectral/nri/nri-{names(hyperspecs)[[ii]]}"), ".tif", ".grd"),
bnames_prefix = "NRI"))
I am right now running it sequentially to see whether the code if completely fine. However, I had this function working already previously on another machine where everything was fine. I just try to get it working on a new HPC machine with Slurm.
sessionInfo()?
``` ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.5.2 (2018-12-20) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui RStudio language (EN) collate en_GB.UTF-8 ctype en_GB.UTF-8 tz Europe/Berlin date 2019-03-02 ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ! package * version date lib source assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.2) backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2) base64url 1.4 2018-05-14 [1] CRAN (R 3.5.2) BBmisc 1.11 2019-01-09 [1] Github (berndbischl/BBmisc@a5a4e45) BiocGenerics 0.28.0 2018-10-30 [1] Bioconductor callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.2) caret * 6.0-81 2018-11-20 [1] CRAN (R 3.5.2) checkmate 1.9.1 2019-01-15 [1] CRAN (R 3.5.2) class 7.3-15 2019-01-01 [1] CRAN (R 3.5.2) classInt 0.3-1 2018-12-18 [1] CRAN (R 3.5.2) cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.2) CodeDepends 0.6.5 2018-07-17 [1] CRAN (R 3.5.2) codetools 0.2-16 2018-12-24 [1] CRAN (R 3.5.2) colorspace 1.4-0 2019-01-13 [1] CRAN (R 3.5.2) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.2) curl * 3.3 2019-01-10 [1] CRAN (R 3.5.2) data.table * 1.12.0 2019-01-13 [1] CRAN (R 3.5.2) DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.2) digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.2) dplyr * 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2) drake * 6.2.1.9001 2019-01-16 [1] Github (ropensci/drake@52590dd) e1071 1.7-0.1 2019-01-21 [1] CRAN (R 3.5.2) fastmatch 1.1-0 2017-01-28 [1] CRAN (R 3.5.2) foreach 1.4.4 2017-12-12 [1] CRAN (R 3.5.2) fs * 1.2.6 2018-08-23 [1] CRAN (R 3.5.2) furrr * 0.1.0.9002 2019-01-14 [1] Github (DavisVaughan/furrr@b4ad6ad) future * 1.11.1.1 2019-01-26 [1] CRAN (R 3.5.2) future.apply * 1.1.0 2019-01-17 [1] CRAN (R 3.5.2) future.callr * 0.4.0 2019-01-07 [1] CRAN (R 3.5.2) generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.2) ggplot2 * 3.1.0 2018-10-25 [1] CRAN (R 3.5.2) git2r 0.24.0 2019-01-07 [1] CRAN (R 3.5.2) globals 0.12.4 2018-10-11 [1] CRAN (R 3.5.2) glue * 1.3.0 2018-07-17 [1] CRAN (R 3.5.2) gower 0.1.2 2017-02-23 [1] CRAN (R 3.5.2) graph 1.60.0 2018-10-30 [1] Bioconductor gtable 0.2.0 2016-02-26 [1] CRAN (R 3.5.2) hsdar * 0.5.2 2019-02-05 [1] Github (pat-s/hsdar@9de91c8) igraph 1.2.4 2019-02-13 [1] CRAN (R 3.5.2) ipred 0.9-8 2018-11-05 [1] CRAN (R 3.5.2) iterators 1.0.10 2018-07-13 [1] CRAN (R 3.5.2) P lattice * 0.20-38 2018-11-04 [?] CRAN (R 3.5.2) lava 1.6.5 2019-02-12 [1] CRAN (R 3.5.2) lazyeval 0.2.1 2017-10-29 [1] CRAN (R 3.5.2) listenv 0.7.0 2018-01-21 [1] CRAN (R 3.5.2) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.2) magrittr * 1.5 2014-11-22 [1] CRAN (R 3.5.2) P MASS 7.3-51.1 2018-11-01 [?] CRAN (R 3.5.2) Matrix 1.2-15 2018-11-01 [1] CRAN (R 3.5.2) mlr * 2.13.9000 2019-02-26 [1] Github (mlr-org/mlr@261593e) mlrCPO * 0.3.4-2 2019-01-10 [1] CRAN (R 3.5.2) ModelMetrics 1.2.2 2018-11-03 [1] CRAN (R 3.5.2) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.2) P nlme 3.1-137 2018-04-07 [?] CRAN (R 3.5.2) nnet 7.3-12 2016-02-02 [1] CRAN (R 3.5.2) packrat 0.5.0 2018-11-14 [1] CRAN (R 3.5.2) parallelMap 1.3 2015-06-10 [1] CRAN (R 3.5.2) ParamHelpers * 1.12 2019-01-18 [1] CRAN (R 3.5.2) pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2) pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.2) plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.2) processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.2) prodlim 2018.04.18 2018-04-18 [1] CRAN (R 3.5.2) ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2) purrr * 0.3.0 2019-01-27 [1] CRAN (R 3.5.2) R.methodsS3 * 1.7.1 2016-02-16 [1] CRAN (R 3.5.2) R.oo * 1.22.0 2018-04-22 [1] CRAN (R 3.5.2) R.utils * 2.8.0 2019-02-14 [1] CRAN (R 3.5.2) R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2) raster * 2.8-19 2019-01-30 [1] CRAN (R 3.5.2) Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.2) recipes 0.1.4 2018-11-19 [1] CRAN (R 3.5.2) reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.2) rgdal * 1.3-9 2019-02-21 [1] CRAN (R 3.5.2) rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2) rootSolve * 1.7 2016-12-06 [1] CRAN (R 3.5.2) rpart 4.1-13 2018-02-23 [1] CRAN (R 3.5.2) rstudioapi 0.9.0 2019-01-09 [1] CRAN (R 3.5.2) scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.2) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.2) sf * 0.7-3 2019-02-21 [1] CRAN (R 3.5.2) signal * 0.7-6 2015-07-30 [1] CRAN (R 3.5.2) sp * 1.3-1 2018-06-05 [1] CRAN (R 3.5.2) storr 1.2.1 2018-10-18 [1] CRAN (R 3.5.2) stringi 1.3.1 2019-02-13 [1] CRAN (R 3.5.2) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.5.2) survival 2.43-3 2018-11-26 [1] CRAN (R 3.5.2) tibble 2.0.1 2019-01-12 [1] CRAN (R 3.5.2) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.2) timeDate 3043.102 2018-02-21 [1] CRAN (R 3.5.2) units 0.6-2 2018-12-05 [1] CRAN (R 3.5.2) withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.2) XML 3.98-1.17 2019-02-08 [1] CRAN (R 3.5.2) ```
It passed when just using lapply()
. So the code per se is fine.
after all files have been written, I get the errors [edit:
"error reading from connection"
] above ...
Did you ever figure something out regarding this? Can I close? I'm not sure I can give any useful feedback without a reproducible example. It sounds like you need to figure out what part tries to read from the files you've just written. Maybe those files are not properly flushed/closed before reading?
Probably unrelated, but when you use hyperspecs[[ii]]
(or in my rewrite hyperspecs[[name]]
) you end up subsetting hyperspecs
within the future iteration. This means that all of hyperspecs
will be exported to the worker before it is subsetted. I doubt it's related to your problem, but it wastes lots of your RAM if it's a large object. Better would be subset before the function call, e.g.
y <- future_mapply(hyperspecs, names(hyperspecs), FUN = function(hs, name) {
filename <- paste0("data/hyperspectral/nri/nri-", sub(name, ".tif", ".grd"))
nbi_raster(hs, filename = filename, bnames_prefix = "NRI")
})
I guess we can close, this is currently not in my scope and I cannot devote time to it.
I would need to rerun it and provide a reprex since things have changed in {future} and {raster} since then.
Also this is a niche case and putting that much time in might not be worth. In addition no one else seems to have faced similar probs since then. Next time I come across such a problem I'll try do come up with a reprex right from the start and revisit here :)
Thanks for the hint regarding RAM! This might indeed have an influence here since RAM usage is usually quite high when dealing with rasters in general, so lowering it with some preprocessing is always very welcome.
In my analysis I write raster files in parallel using
future_iwalk(writeRaster())
in combination withplan(future.callr::callr, workers = X)
.When it comes to the writing step, I face the "error reading from connection" error. Using
plan(future::multisession)
however works.Is this a known shortcoming? I can try to prepare a reprex if the case is worth to be investigated.