Open pfernique opened 4 years ago
I tried with future::multicore
it's working fine but with future::multisession
I have a similar problem (Error in unserialize(node$con) : error reading from connection
).
Is it normal or do you have any idea why ?
Yes. There's lots of exception handling done in the future framework, and some of it even recoverable, but kicking workers far off the track is not automagically taken care.
Before anything else, use multicore
or multisession
explicitly. The multiprocess
is just an alias to one of them depending on your operating system. I'm going to phase out multiprocess
because it is ambiguous (e.g. I don't know what OS you're running here but reading between the lines in your error reports, it sounds like you're running on MS Windows).
get a similar
In the multisession
case, we run PSOCK background workers (as defined by the parallel package) that communicate over a socket connection. If you kill a background worker, the communication with main R session is likely to become corrupted. In the future.callr::callr
, which handled by the callr package, you get similar errors because callr communicates via the file system - a half-written file is corrupt. In the multicore
case, workers are forked processes. Knocking those offline will confuse the main R process because it can no longer find a way to communicate with it's child process. The symptom will be something like a message on "An irrecoverable exception occurred. R is aborting now ..." from the forked process. On MS Windows, multicore
equals sequential
, which means the above example will kill the main R session.
In summary, what you're asking for is not part of the current future backend design. To support it, in general, would require looots of work. Even if it's a long-term roadmap, there are several things that need to come in place before it can be attacked. I also doubt one can cover cases such as sequential
. Before this, it is more likely that someone develops a future backend that can handle severe corruption like this. Indeed, it might be that the batchtools package supports it, e.g. try with the sequential plan(future.batchtools::batchtools_local)
.
Thanks for your reply ! I'm on Windows Subsystem for Linux (that behaves as Linux). I was glad enough to find that one backend could recover from segfaults. I was just surprised that it wasn't the future.callr::callr
backend. Since callr communicates via the filesystem, I thought that handling segfaults will be easier: I was using callr when the code wasn't parallelized since the segfault in the launched session was transformed as an errror (only for the segfaulting process not the following ones) in my current session (Error in readRDS(res) : error reading from connection
).
future.batchtools::batchtools_local
seems quite interesting, I will give it a try !
saveRDS()
is not atomic, so if killed in the middle of a write it will leave behind a half-written file, which results in that readRDS()
error.
Yes, I have no problem understanding that. It's just that it seems to indicate that all processes use the same rds
file (otherwise I really don't get why a rds
file corrupted by a process would lead to corrupted rds
files for all remaining processes ) and I naively believed that a different rds
file would be used for each process.
Hi,
I'm trying to launch some processes that can sometimes throw a segfault (And this can't be predicted or modified since I don't have the source code). This MWE code is behaving has I want using
future::multiprocess
plan (i.e., return40
in the last line).But, this MWE code is not behaving has I want using
future.callr::callr
plan (i.e., throwError in readRDS(res) : error reading from connection
).Is it normal or do you have any idea why ? Note that I'm using
future v1.15.1
andfuture.callr v0.5.0
.