HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
957 stars 83 forks source link

interruptible calls to value and resolved #214

Closed rubenarslan closed 6 years ago

rubenarslan commented 6 years ago

I frequently find myself having to restart an Rstudio session, because resolved or value don't return (don't seem to ever return). I understand why value does not return if the future isn't finished, but I thought resolved would always return immediately. Once this happens, Esc and the stop button in Rstudio don't help, only force-quitting R. I haven't yet been able to isolate why this works sometimes and not others. These are usually implicit futures with a nested topology: list(remote, multicore).

Is this a known problem or would a reproducible example help?

HenrikBengtsson commented 6 years ago

Hi. Hard to say what's going on without a reproducible example and other details. For instance, it may be that the master R process is busy retrieving a large amount of data from one of the R workers. See also if you can reproduce this in a plain R session outside of RStudio - could be an RStudio thing.

HenrikBengtsson commented 6 years ago

... and yes, resolved() should be non-blocking and return momentarily with either FALSE or TRUE.

rubenarslan commented 6 years ago

Can you help me make reproducible examples for these problems? Obviously, it's a bit harder since it involves private servers... I don't know what you'd need to know about my server's setups etc.

Like I said, this doesn't always happen. Just now, I tried to send a job to one computation server (without an error message, no R process was ever spawned when viewed via top on the server). When I tried accessing the future's value, R hung and I had to force-quit. It seems unlikely that a huge amount of data was being transmitted, since this was right after job submission and the data isn't huge. After this, I restarted R. Trying to call login <- tweak(remote, workers = rep("arslan@arc-srv-cpt7.mpib-berlin.mpg.de", 1), persistent = FALSE); plan(login) led to this error message

Error: Internal error: Unexpected result retrieved for ClusterFuture future (‘’): ‘NA’

The second server is from what I can tell identical in setup to another, for which it worked immediately. I can ssh in and run R on both, same R version etc, except the first one (with the error) runs Ubuntu 16, the other 14.

HenrikBengtsson commented 6 years ago

Before anything else, for:

Error: Internal error: Unexpected result retrieved for ClusterFuture future (‘’): ‘NA’

see Issue #215

rubenarslan commented 6 years ago

Ok, so this was a version mismatch (1.7.0 vs. 1.8.0). I'm not 100% sure that the new version was already loaded before the restart after the hang. I'll see if it recurs.

HenrikBengtsson commented 6 years ago

Thanks for the follow up. Yes, running with new future 1.8.0 on master and future (< 1.8.0) on workers will cause problems and non-informative error messages like what you've seen. It could also be that it explains the silent "stalls" you're observing. Not detecting that future is not installed or is outdated on the workers is an oversight by me in the future 1.8.0 release - I'll improve this in the next release (Issue #216).

HenrikBengtsson commented 6 years ago

UPDATE: I stumbled upon a similar "non-responsiveness" in futures that occurred when the worker didn't have the future package installed. I could reproduce it, I added a test, and fixed it in commit 634729e. Later I added code to detect when the future package was missing resulting in an early and nicer error being signaled. Both layers of protection helps avoid this non-responsiveness of workers.

Hopefully, these updates helps in your case as well. To test the new code, use:

remotes::install_github('HenrikBengtsson/future@develop')

I'm closing this issue, but please feel free to reopen if the new code is not helping.