Open HenrikBengtsson opened 7 years ago
Ok, I wasn't completely crazy. I did validate this on Windows in the past, but apparently never on Linux. The default timeout is 0.2 seconds, that behaves as infinite waiting / timeout on Linux, cf.
https://stat.ethz.ch/pipermail/r-devel/2016-October/073218.html
I'm pretty sure that's a bug.
The workaround is to adjust to use timeout = 0
or timeout = 1
.
In https://github.com/HenrikBengtsson/Wishlist-for-R/issues/35#issuecomment-334544732, you mentioned this issue should be fixed in base R.
However, I am still running into the delay on Mac OS and Linux with R 4.0.3.
library(future)
packageVersion("future")
#> [1] '1.21.0.9000'
packageDescription("future")$GithubSHA1
#> [1] "8a574375083dae55099eb2ef46e7223e4c735235"
plan(multisession)
f <- future(Sys.sleep(5))
system.time(replicate(10, resolved(f)))
#> user system elapsed
#> 0.015 0.001 2.038
Created on 2021-03-05 by the reprex package (v1.0.0)
Hmm... interesting. Thanks for reporting here. I'm reopening until resolved or figured out what the problem is.
Let's hop over to https://github.com/HenrikBengtsson/Wishlist-for-R/issues/35#issuecomment-792341002 to continue the discussion there. It might be that you're observing that problem, but it does not block indefinitely, or for 5 seconds, but 10 * (timeout=0.2) = 2 seconds, makes me suspect it's a different issue.
FWIW, I've added "secret" option getOption("future.resolved.timeout", 0.2)
and corresponding environment variable R_FUTURE_RESOLVED_TIMEOUT
to the next release of future to make it easier to test and troubleshoot this. It might be that options(future.resolved.timeout = 0)
can also be used as a temporary workaround for people running into this issue, but I don't want to make this part of the public API thinking it's how it should be done. If this is a bug in R, then it should be fixed there.
So, after ruling out that bug https://github.com/HenrikBengtsson/Wishlist-for-R/issues/35#issuecomment-792794028 was not involved, I know for sure what is "going on". It's actually working as expected. I get the same 0.2-second delay when calling resolved()
on a future that is not resolved. If it is resolved resolved()
will return momentarily (after pulling in the results). This is illustrated below.
This leaves us whether or not we can lower the default timeout = 0.2
for resolved()
. I don't recall the exact reason for picking 0.2 in the first place but it feels like something I've forgotten about - the fact that there was not R option available to control this suggests just that.
It might be that timeout = 0.0
is safe and will work everywhere. My main concern with a zero timeout is that resolved()
will end up return FALSE indefinitely on some systems. With a non-zero timeout, the probability for this happening should be less than one. I guess to 100% know, one would have to dive into selectSocket(..., timeout)
to see what it does on all operating systems (and maybe even R versions).
Also, since I've added support for controlling the default via environment variables (e.g. R_FUTURE_RESOLVED_TIMEOUT
and R_FUTURE_CLUSTER_RESOLVED_TIMEOUT
, I can at least check with timeout = 0.0
on AppVeyor, GitHub Actions, Travis, and on Linux against all reverse package dependencies.
@will, does this make sense to you?
PS. I really hope timeout = 0.0
truly works. That would remove some unnecessary waiting overhead spent right now.
resolved()
will wait up to 0.2 seconds to get a response from the worker. If the future is not resolved, then there is no response, and it will timeout after timeout = 0.2
seconds. Proof of concept:
$ R --vanilla
> library(future)
> plan(multisession, workers = 8)
> system.time(replicate(nbrOfWorkers(), { f <- future(Sys.sleep(5)); r <- resolved(f); print(r) }))
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
user system elapsed
0.111 0.000 2.218
This is inline with the default timeout
is 0.2. If we increase the timeout
, this is reflected in the overall elapsed time, e.g.
> plan(sequential); plan(multisession, workers = 8)
> system.time(replicate(nbrOfWorkers(), { f <- future(Sys.sleep(5)); r <- resolved(f, timeout=0.5); }))
user system elapsed
0.090 0.005 4.500
If the future is resolved before resolved()
is called, then resolved()
will infer that the worker is done immediately, i.e. timeout = 0.2
is irrelevant. Now, if the worker is done, then resolved()
will bring down the results from the worker, so there's some overhead coming from that - overhead that is then saved when calling value()
. Proof of concept:
$ R --vanilla
> library(future)
> plan(multisession, workers = 8)
> system.time(replicate(nbrOfWorkers(), { f <- future(NULL); r <- resolved(f); print(r) }))
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
user system elapsed
0.102 0.004 0.460
@will, does this make sense to you?
not really tbh. I poked around with R a little bit during college, but had a legit free license for mathematica at the time from the college so just kept using that instead. sorry I can’t help with this more.
hehe. Sorry, about that. But, you're welcome over to R whenever you're ready - we're all here for you when that day happens.
redirecting to @wlandau
Thanks for explaining, Henrik. This is making more sense. I think I was getting confused because of a system-specific edge case: on my work Macbook (OS 10.15.7) with RStudio Desktop 1.4.869, resolved(f, timeout = 1L)
returns immediately. So I assumed timeout
meant something other than waiting for the future to resolve.
> library(future)
> plan(multisession)
> f <- future(Sys.sleep(10))
> system.time(replicate(10, print(resolved(f, timeout = 1L))))
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
user system elapsed
0.008 0.000 0.009
Unfortunately, I cannot reproduce the above outside the IDE on Mac OS, and I cannot reproduce it with or without the on IDE RHEL 7 or Ubuntu 18.04.
I take it there is no non-blocking version of socketSelect()
?
Are you saying resolved(f, timeout = 1L)
works differently from resolved(f, timeout = 1.0)
?
My mistake, that was not my intention.
I just tried again with resolved(f, timeout = 1.0)
and it still returned immediately.
@wlandau, thxs, but I confused. In https://github.com/HenrikBengtsson/future/issues/99#issuecomment-791437224, you reported a total waiting time of ~2 seconds, which is inline with the default times = 0.2
seconds per resolved()
call. But, in https://github.com/HenrikBengtsson/future/issues/99#issuecomment-793231270, you report that an explicit timeout = 1.0
gives near-zero delay.
That timeout = 1.0
value is passed to socketSelect(..., timeout = timeout)
. If there is nothing sent from the workers ("the future"), then socketSelect()
should block until timeout. If it returns immediately as you claim, then there's either a bug in socketSelect()
or there are results available on the connection, i.e. the future is resolved.
Although I don't think it matters, let's rule out develop version and run with the latest stable release install.packages("future")
. What do you get if you run:
library(future)
resolved_timeout <- function(timeout, n = 10L) {
cat(sprintf("timeout=%g:\n", timeout))
replicate(n, {
cat(system.time({
cat(resolved(f, timeout = timeout), ": ")
})[["elapsed"]], "s\n")
})
}
plan(multisession, workers = 2L)
f <- future(Sys.sleep(20))
resolved_timeout(timeout = 0.1)
resolved_timeout(timeout = 1.0)
v <- value(f)
I get:
timeout=0.1:
FALSE : 0.101 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
timeout=1:
FALSE : 1.003 s
FALSE : 1.003 s
FALSE : 1.002 s
FALSE : 1.003 s
FALSE : 1.002 s
FALSE : 1.002 s
FALSE : 1.003 s
FALSE : 1.003 s
FALSE : 1.002 s
FALSE : 1.002 s
These results is what I'd expect you to see as well.
I agree this is strange. Here is what I see on Mac OS with an interactive R session in RStudio Desktop.
timeout=0.1:
FALSE : 0.101 s
FALSE : 0.102 s
FALSE : 0.104 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.102 s
FALSE : 0.105 s
FALSE : 0.105 s
timeout=1:
FALSE : 0.001 s
FALSE : 0 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
There seems to be a mysterious cutoff at 1. When I tried:
library(future)
resolved_timeout <- function(f, timeout, n = 10L) {
cat(sprintf("timeout=%g:\n", timeout))
replicate(n, {
cat(system.time({
cat(resolved(f, timeout = timeout), ": ")
})[["elapsed"]], "s\n")
})
}
plan(multisession, workers = 2L)
f <- future(Sys.sleep(20))
tmp <- resolved_timeout(f, timeout = 0.999)
tmp <- resolved_timeout(f, timeout = 1.001)
tmp <- resolved_timeout(f, timeout = 1.5)
tmp <- resolved_timeout(f, timeout = 10.37)
I saw:
> tmp <- resolved_timeout(f, timeout = 0.999)
timeout=0.999:
FALSE : 1 s
FALSE : 1.001 s
FALSE : 1.002 s
FALSE : 0.999 s
FALSE : 1 s
FALSE : 1 s
FALSE : 1.001 s
FALSE : 1.005 s
FALSE : 1.003 s
FALSE : 1.004 s
> tmp <- resolved_timeout(f, timeout = 1.001)
timeout=1.001:
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
> tmp <- resolved_timeout(f, timeout = 1.5)
timeout=1.5:
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
> tmp <- resolved_timeout(f, timeout = 10.37)
timeout=10.37:
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
That's is crazy odd and, yes, it smells like that R (<= 3.4.2) bug. But before diving into those details, lets rule out two things: (i) there's no user mistake here, and (ii) it also happens outside of RStudio, i.e. in the plain R terminal.
For the first one, I've updated the test code to be more conservative. Can you please re-check with the below, both in RStudio Console and in barebone R running R --vanilla?
library(future)
resolved_timeout <- function(f, timeout, n = 10L) {
stopifnot(getRversion() >= "3.4.3", inherits(f, "ClusterFuture"), !resolved(f, timeout = 0))
cat(sprintf("timeout=%g:\n", timeout))
replicate(n, {
cat(system.time({
cat(r <- resolved(f, timeout = timeout), ": ")
})[["elapsed"]], "s\n")
stopifnot(!r) ## If resolved, test is invalid
})
}
plan(multisession, workers = 2L)
f <- future(Sys.sleep(1000))
tmp <- resolved_timeout(f, timeout = 0.999)
tmp <- resolved_timeout(f, timeout = 1.001)
tmp <- resolved_timeout(f, timeout = 1.5)
tmp <- resolved_timeout(f, timeout = 10.37)
FYI, I've tested this on Ubuntu 18.04 with R 4.0.4 running in the terminal as well as in the RStudio Console; the reporting elapsed times are the same as the 'timeout':s.
If we have established that this is 100% sure and happens in vanilla R too, then I'll try to come up with an example to reproduce this without the future package. If there's a bug in R, it would be nice to get it fixed in R patched so it is part of the next R 4.1.0 release.
The problem appears to be unique to my installation of RStudio Desktop 1.4.869 on Mac OS 10.15.7. If I start R
or R --vanilla
in the terminal, I see the correct delays.
Thanks. What version of R do you run in RStudio? (On Linux it can be different from what you have on PATH depending on how you start it).
Also, given that is so tricky to run with --vanilla in RStudio, what's you session info in RStudio?
Trying to rule out the last few things before taking the next step here.
My RStudio is also running R 4.0.3. From the session info lists, it looks like RStudio is using different a different BLAS and it automatically loads yaml
.
@will have you checked this one with the latest future 1.22.1 (2021-08-11)? One of the updates were:
I wonder if we can close this one, or if there's still something to fix here.
Looking back at https://github.com/HenrikBengtsson/future/issues/99#issuecomment-795837942, here is what I get running R 4.1.0 with --vanilla
. Timeout is respected.
> library(future)
> packageVersion("future")
[1] ‘1.22.1’
> resolved_timeout <- function(f, timeout, n = 10L) {
+ stopifnot(getRversion() >= "3.4.3", inherits(f, "ClusterFuture"), !resolved(f, timeout = 0))
+ cat(sprintf("timeout=%g:\n", timeout))
+ replicate(n, {
+ cat(system.time({
+ cat(r <- resolved(f, timeout = timeout), ": ")
+ })[["elapsed"]], "s\n")
+ stopifnot(!r) ## If resolved, test is invalid
+ })
+ }
>
> plan(multisession, workers = 2L)
f <- future(Sys.sleep(1000))
tmp <- resolved_timeout(f, timeout = 0.999)
tmp <- resolved_timeout(f, timeout = 1.001)
tmp <- resolved_timeout(f, timeout = 1.5)
tmp <- resolved_timeout(f, timeout = 10.37)
> f <- future(Sys.sleep(1000))
>
> tmp <- resolved_timeout(f, timeout = 0.999)
timeout=0.999:
FALSE : 1.005 s
FALSE : 1.004 s
FALSE : 1.004 s
FALSE : 1.003 s
FALSE : 1 s
FALSE : 1 s
FALSE : 1 s
FALSE : 1 s
FALSE : 1 s
FALSE : 1 s
> tmp <- resolved_timeout(f, timeout = 1.001)
timeout=1.001:
FALSE : 1.005 s
FALSE : 1.005 s
FALSE : 1.004 s
FALSE : 1.006 s
FALSE : 1.003 s
FALSE : 1.002 s
FALSE : 1.004 s
FALSE : 1.003 s
FALSE : 1.006 s
FALSE : 1.004 s
> tmp <- resolved_timeout(f, timeout = 1.5)
timeout=1.5:
FALSE : 1.503 s
FALSE : 1.502 s
FALSE : 1.502 s
FALSE : 1.503 s
FALSE : 1.501 s
FALSE : 1.505 s
FALSE : 1.502 s
FALSE : 1.506 s
FALSE : 1.504 s
FALSE : 1.502 s
> tmp <- resolved_timeout(f, timeout = 10.37)
timeout=10.37:
FALSE : 10.372 s
FALSE : 10.376 s
FALSE : 10.372 s
FALSE : 10.372 s
FALSE : 10.374 s
FALSE : 10.375 s
FALSE : 10.373 s
FALSE : 10.375 s
FALSE : 10.371 s
FALSE : 10.373 s
>
Using RStudio 1.4.1717, the results are somehow different (resembling https://github.com/HenrikBengtsson/future/issues/99#issuecomment-795424090).
> library(future)
>
> resolved_timeout <- function(f, timeout, n = 10L) {
+ stopifnot(getRversion() >= "3.4.3", inherits(f, "ClusterFuture"), !resolved(f, timeout = 0))
+ cat(sprintf("timeout=%g:\n", timeout))
+ replicate(n, {
+ cat(system.time({
+ cat(r <- resolved(f, timeout = timeout), ": ")
+ })[["elapsed"]], "s\n")
+ stopifnot(!r) ## If resolved, test is invalid
+ })
+ }
>
> plan(multisession, workers = 2L)
> f <- future(Sys.sleep(1000))
>
> tmp <- resolved_timeout(f, timeout = 0.999)
timeout=0.999:
FALSE : 1.004 s
FALSE : 1.003 s
FALSE : 1.001 s
FALSE : 1.002 s
FALSE : 1.001 s
FALSE : 1.004 s
FALSE : 1.001 s
FALSE : 1.001 s
FALSE : 1.005 s
FALSE : 1.003 s
> tmp <- resolved_timeout(f, timeout = 1.001)
timeout=1.001:
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0 s
FALSE : 0.001 s
> tmp <- resolved_timeout(f, timeout = 1.5)
timeout=1.5:
FALSE : 0 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
> tmp <- resolved_timeout(f, timeout = 10.37)
timeout=10.37:
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
FALSE : 0.001 s
All this ran on a Macbook (Big Sur).
This is really weird, and it smells like an RStudio bug to me. I think the next task is to distill a reproducible example based on base R code only, i.e. con <- socketConnection("localhost", ...)
and socketSelect(con, timeout)
.
UPDATE: Maybe it's a tad easier to start reproducing it based on cl <- parallel::makeCluster(1L)
and socketSelect(cl[[1]]$con, timeout)
. That way one do not have to worry about how to set up background R sessions.
Calling
resolved()
on a ClusterFuture blocks - at least on Linux. Hmm... I'm pretty sure I checked this in the past, but maybe things changed in R / parallel since and / or I checked it on Windows but not on Linux.ACTIONS: