HenrikBengtsson / future.callr

:rocket: R package future.callr: A Future API for Parallel Processing using 'callr'
https://future.callr.futureverse.org
62 stars 1 forks source link

MultisessionFuture (<none>) failed to receive results from cluster RichSOCKnode #24

Closed vnijs closed 2 years ago

vnijs commented 2 years ago

Sorry but no (small) reproducible example yet. In the middle of teaching and this just popped up.

Been using future.callr for a few years and it has been working great. After a number of upgrades to R packages, the below error started popping up on our (AMD64) linux server. The student questions/assignments are the same and they have worked well for a while without issue. I'll try reverting various package versions but perhaps you have some suggestions where I might start looking?

Is it possible that this is a server load issue? i.e., it is (more) likely to occur when lots of students are trying the submit at the same time?

Note: Was able to replicate this when running the app from Rstudio Server so not likely a load issue I think.

Related question: This is a shiny app that uses knitr to process students' code submissions. I need each student's submission to be 'clean' from any contamination from submissions from other students. future.callr was the only way that I found to do that a few years ago. Are there new option (i.e., other future 'plans')?

Related question: Do you think it would be feasible to have future connect to a (running) docker container to execute commands in the container? I thought that might help alleviate upgrade-type of problems, no?

Thanks again for the wonderful series of future packages.

image

image

vnijs commented 2 years ago

I have tried various different package versions and have not yet been able to isolate the problem. Interestingly, everything runs fine on Windows, macOS, and inside a docker container using Ubuntu 20.04.

The above screenshots didn't contain messages from future.callr so see below.

image

See also:

Warning: Error in unserialize: MultisessionFuture (<none>) failed to receive results
from cluster RichSOCKnode #1 (PID 2255114 on localhost 'localhost'). The reason
reported was 'error reading from connection'. Post-mortem diagnostic: No process
exists with this PID, i.e. the localhost worker is no longer alive. Detected a 
non-exportable reference ('externalptr') in one of the globals ('knit_it' of class
'function') used in the future expression. The total size of the 11 globals exported is
159.11 KiB. The three largest globals are 'knit_it' (107.98 KiB of class 'function'),
'HTML' (18.40 KiB of class 'function') and 'is_empty' (14.47 KiB of class 'function')
vnijs commented 2 years ago

Traced the problem back to a numpy / reticulate issue. After installing numpy from source, everything works as expected. Thanks again for the line of future packages!

https://github.com/rstudio/reticulate/issues/1190#issuecomment-1108851658