Closed brainprint closed 5 years ago
Please, keep to code confidential.
FYI, this is a public GitHub fourm; I've removed your attached R code for you.
Thank you. I hope you can reproduce the issue. When I have removed the %<-% it worked without warnings.
Hi. Your code is very very long - you need to come up with a much smaller (= minimal) reproducible example and explain what type of troubleshooting you've attempted to try to narrow this issue down. You're the first one reporting these type of issues.
Hi. I was afraid to change the code and loose the issue. I have tried with a simpler code, but I was unable to repeat it.
As I remove some parts with %<-% from the original code, the number of warnings reduces.
Warning messages: 1: In selectChildren(pids[!fin], -1) : cannot wait for child 1724 as it does not exist 2: In selectChildren(pids[!fin], -1) : cannot wait for child 1724 as it does not exist
Please, don't worry about my case, because it seems to be running correctly despite the warnings.
I just reported it because the unique change was the new R version, from yesterday.
Thank you for your support.
I see. Thanks for clarifying that those warnings seem harmless. I'll keep the issue open for a while in case other macOS users start to see these as well.
I get the same warnings on macOS 10.13.4 and R 3.5.0 using the following test code:
library(future)
plan(multiprocess)
testList <- vector(mode = "list", length = 10)
for (i in c(1:length(testList))) {
testList[[i]] <- future({i * 4})
}
testList <- resolve(testList)
testList <- values(testList)
Once the loop completes there are 50 warnings.
I also get the same warnings when using resolve
and values
in this case. However, when using plan(multisession)
there are no errors so there may be something related to forking and multicore
.
Despite the warnings, the result is the same as when running the same loop single-threaded.
I don't have access to macOS, so I need your help to troubleshoot. From the warning details by @brainprint, the warning appears to come from the parallel package, so I believe this is independent of the future package. If you run the following in a fresh R session:
jobs <- lapply(1:10, FUN = parallel::mcparallel)
values <- parallel::mccollect(jobs)
unlist(values)
I'd expect that you'd also get those warnings - is that the case?
Quick comment: the warnings on "cannot wait for child %d as it does not exist"
were indeed only introduced in R (>= 3.5.0), cf. https://github.com/wch/r-source/commit/eb468006b82d96917db88e2310286b54a27b47b7#diff-227a0fc52be87760fb0ed6bdc16527f4R781
Interestingly the little test you posted above produces no errors or warnings when run (see attached output from R CMD BATCH). future_test.txt
EDIT: Including output here /HB:
R version 3.5.0 (2018-04-23) -- "Joy in Playing"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.5.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> jobs <- lapply(1:10, FUN = parallel::mcparallel)
> values <- parallel::mccollect(jobs)
> unlist(values)
3805 3806 3807 3808 3809 3810 3811 3812 3813 3814
1 2 3 4 5 6 7 8 9 10
> warnings()
>
> proc.time()
user system elapsed
0.322 0.073 0.340
Thanks. It could be that it needs to be hit harder with more tasks, or it could be something I do in the future package. I'll add it to the list of things to investigate.
Same here, no warnings.
12474 12475 12476 12477 12478 12479 12480 12481 12482 12483 1 2 3 4 5 6 7 8 9 10
And from @rps13 example, the summary is:
summary(warnings()) Summary of (a total of 50) warning messages: 1x : In selectChildren(job, timeout = timeout) : cannot wait for child 12555 as it does not exist 2x : In selectChildren(job, timeout = timeout) : cannot wait for child 12554 as it does not exist 3x : In selectChildren(job, timeout = timeout) : cannot wait for child 12553 as it does not exist 3x : In selectChildren(pids[!fin], -1) : cannot wait for child 12555 as it does not exist 3x : In selectChildren(pids[!fin], -1) : cannot wait for child 12554 as it does not exist 3x : In selectChildren(pids[!fin], -1) : cannot wait for child 12553 as it does not exist 2x : In selectChildren(job, timeout = timeout) : cannot wait for child 12556 as it does not exist 1x : In selectChildren(pids[!fin], -1) : cannot wait for child 12556 as it does not exist 2x : In selectChildren(job, timeout = timeout) : cannot wait for child 12557 as it does not exist 4x : In selectChildren(pids[!fin], -1) : cannot wait for child 12557 as it does not exist 6x : In selectChildren(job, timeout = timeout) : cannot wait for child 12558 as it does not exist 6x : In selectChildren(pids[!fin], -1) : cannot wait for child 12558 as it does not exist 6x : In selectChildren(job, timeout = timeout) : cannot wait for child 12559 as it does not exist 4x : In selectChildren(pids[!fin], -1) : cannot wait for child 12559 as it does not exist 3x : In selectChildren(job, timeout = timeout) : cannot wait for child 12560 as it does not exist 1x : In selectChildren(pids[!fin], -1) : cannot wait for child 12560 as it does not exist
Thxs.
Ah... not just macOS; by coincident I just stumbled upon this on a Linux cluster. Here's a minimal example that I can work with:
> library("future")
> plan(multicore, workers = 2L)
> fs <- lapply(1:2L, FUN = future)
> values(fs)
[[1]]
[1] 1
[[2]]
[1] 2
Warning messages:
1: In selectChildren(job, timeout = timeout) : #<== produced by future::resolved.MulticoreFuture()
cannot wait for child 362508 as it does not exist
2: In selectChildren(job, timeout = timeout) : #<== produced by future::resolved.MulticoreFuture()
cannot wait for child 362506 as it does not exist
3: In selectChildren(pids[!fin], -1) : #<== produced by parallel::mccollect()
cannot wait for child 362508 as it does not exist
4: In selectChildren(pids[!fin], -1) : #<== produced by parallel::mccollect()
cannot wait for child 362508 as it does not exist
This shouldn't happen, so I'll flag this as a bug (which probably been there before but only reveals itself in R (>= 3.5.0).
@HenrikBengtsson using your example
Warning messages:
1: In selectChildren(job, timeout = timeout) :
cannot wait for child 13206 as it does not exist
2: In selectChildren(job, timeout = timeout) :
cannot wait for child 13205 as it does not exist
3: In selectChildren(pids[!fin], -1) :
cannot wait for child 13206 as it does not exist
4: In selectChildren(pids[!fin], -1) :
cannot wait for child 13206 as it does not exist
And I agree with your assessment ("been there before"). Doubt: is the origin R 3.5.0 or future package?
I'm leaning toward 'future' now - the simplest explanation would be that the future framework polls the workers one time to many also after the results have been already collected and the forked child process is gone. Just a guess for now - I'll try to find time to investigate and fix (or report upstream to R core if that's where the error is). As you've observed, these warnings are harmless - inspecting the R core code confirms that.
Thanks for looking into this. Judging by the comments in the source of parallel
you linked to above, this may not remain a warning forever.
I'm suppressing these warnings for now since they are quite annoying, while acknowledging that the long-term solution is to fully understand what's going on so it can be fixed. I'm going to do a quick future 1.8.1 release, so the long-term fix will come in a later release.
I know it is a closed issue, but since I was investigating it and have some findings I would like to share them. Please check the following snippet:
# Define job factory
jobFactory <- function() {
parallel::mcparallel({
Sys.getpid()
})
}
# Example 1: trigger warning
job1 <- jobFactory()
parallel::mccollect(job1, wait = FALSE)
# No warnings
job2 <- jobFactory()
parallel::mccollect(job2, wait = FALSE)
# Warning message:
# In selectChildren(jobs, timeout) :
# cannot wait for child [pid of job1] as it does not exist
# Restart R session
rstudioapi::restartSession()
# Example 2: no warning, manual kill of processes
job1 <- jobFactory()
parallel::mccollect(job1, wait = FALSE)
parallel:::rmChild(job1)
# No warnings
job2 <- jobFactory()
parallel::mccollect(job2, wait = FALSE)
parallel:::rmChild(job2)
# No warnings
# Restart R session
rstudioapi::restartSession()
# Example 3: no warnings, call mccollect twice
job1 <- jobFactory()
job2 <- jobFactory()
parallel::mccollect(wait = FALSE)
# $`23428`
# [1] 23428
#
# $`23427`
# [1] 23427
parallel::mccollect(wait = FALSE)
# $`23428`
# NULL
#
# $`23427`
# NULL
I think the warning in title is triggered by parallel:::selectChildren
called by mccollect
. In case mccollect
is called as non-blocking (wait = FALSE
), forked processes are killed only on the second call. I've run this in Ubuntu 18.04, R 3.5.1.
Thanks for this. I'm on a phone now so haven't tried to reproduce but these are useful findings. So, it looks independent of the future package and specific to R and the parallel package. We should report upstreams to get this fixed.
Importantly, can you reproduce this outside of RStudio in a fresh R terminal session?
If so, would you mind reporting this to the R-devel mailing list? Then the R core devels will see it.
FYI, I can reproduce this in a pure R session on Linux;
job1 <- parallel::mcparallel(Sys.getpid())
parallel::mccollect(job1, wait = FALSE)
job2 <- parallel::mcparallel(Sys.getpid())
### $`16223`
### [1] 16223
parallel::mccollect(job2, wait = FALSE)
### Warning in selectChildren(jobs, timeout) :
### cannot wait for child 16223 as it does not exist
### $`16247`
### [1] 16247
And now, in front a real screen (was on my phone before), I see that the purpose of your comment might have been to suggest that we should fix this in the future package by making sure to call also parallel::rmChild()
. I confirm that I see also this:
job1 <- parallel::mcparallel(Sys.getpid())
parallel::mccollect(job2, wait = FALSE)
### $`16441`
### [1] 16441
parallel:::rmChild(job1)
### [1] FALSE
job2 <- parallel::mcparallel(Sys.getpid())
parallel::mccollect(job2, wait = FALSE)
### $`16444`
### [1] 1444
parallel:::rmChild(job2)
### [1] TRUE
I'll try to add this ...
The following - "Fix uninitialized variable in a cleanup mark (parallel/fork)" - was just committed to R-devel /src/library/parallel/src/fork.c:
index 3fe779474d..d2c6788b0f 100644
--- a/src/library/parallel/src/fork.c
+++ b/src/library/parallel/src/fork.c
[...]
@@ -288,6 +288,8 @@ SEXP mc_prepare_cleanup()
ci->waitedfor = 1;
ci->detached = 1;
ci->pid = -1; /* a cleanup mark */
+ ci->pfd = -1;
+ ci->sifd = -1; /* set fds to -1 to simplify close */
ci->ppid = getpid();
ci->next = children;
children = ci;
Not sure, but it could be related to this issue.
UPDATE: It looks like the underlying issue has been fixed R devel rev75467 - "Fix mc_select_children warning about non-existent children to wait for".
The problem is still there in R 3.5.1 patched:
$ R
R version 3.5.1 Patched (2018-10-20 r75479) -- "Feather Spray"
[...]
> job1 <- parallel::mcparallel(Sys.getpid())
> parallel::mccollect(job1, wait = FALSE)
$`287758`
[1] 287758
> job2 <- parallel::mcparallel(Sys.getpid())
> parallel::mccollect(job2, wait = FALSE)
$`288075`
[1] 288075
Warning message:
In selectChildren(jobs, timeout) :
cannot wait for child 287758 as it does not exist
but is indeed fixed in R devel:
$ R
R Under development (unstable) (2018-10-21 r75476) -- "Unsuffered Consequences"
[...]
> job1 <- parallel::mcparallel(Sys.getpid())
> parallel::mccollect(job1, wait = FALSE)
$`289242`
[1] 289242
> job2 <- parallel::mcparallel(Sys.getpid())
> parallel::mccollect(job2, wait = FALSE)
NULL
## wait a bit longer ...
> parallel::mccollect(job2, wait = FALSE)
$`328590`
[1] 328590
It's only if we call it again after already having collected the value that we get the warning:
> parallel::mccollect(job2, wait = FALSE)
NULL
Warning message:
In selectChildren(jobs, timeout) :
cannot wait for child 328590 as it does not exist
I can also confirm that future 1.8.0, which is the last version before the package suppress those warning manually, which produces the warning when running in R 3.5.1 patched (and before):
> library(future); plan(multicore, workers = 2L); fs <- lapply(1:2, FUN = future); values(fs)
[[1]]
[1] 1
[[2]]
[1] 2
Warning messages:
1: In selectChildren(job, timeout = timeout) :
cannot wait for child 375577 as it does not exist
2: In selectChildren(job, timeout = timeout) :
cannot wait for child 375576 as it does not exist
3: In selectChildren(pids[!fin], -1) :
cannot wait for child 375577 as it does not exist
4: In selectChildren(pids[!fin], -1) :
cannot wait for child 375577 as it does not exist
but not when running R-devel ("3.6.0"), e.g.
> library(future); plan(multicore, workers = 2L); fs <- lapply(1:2, FUN = future); values(fs)
[[1]]
[1] 1
[[2]]
[1] 2
From this I conclude we can drop the suppressWarnings()
that was introduced in future 1.8.1 in R (>= 3.6.0).
This has now also been fixed in R 3.5.1 patched, which means they will not appear in R 3.5.2 (if that is ever released). I can confirm that I don't see those warning using R version 3.5.1 Patched (2018-11-06 r75555) and future 1.8.0.
I've updated the develop code to supress warnings only when running R 3.5.0 and R 3.5.1. I ignore older version of R 3.5.1 patched, so running the develop version of future there will produce those warnings.
Hi,
First of all, thank you for the package.
After updating to R 3.5.0, the same code, on the same machine (MAC - masOS Sierra), with no other changes, started providing warnings like: cannot wait for child 15641 as it does not exist. There is a Monte Carlo simulation and I am calculating the net present value and internal rate of return for each cashflow.
The amount of warnings are close to 50 no matter if for 100 or 10k simulations.
Best Regards, Rogério Normand.
PS: As the original code contains classified info, I messed up with the values/metrics/results , but the logic remains intact. Please, keep to code confidential.