futureverse / future.mirai

:rocket: R package future.mirai: A Future API for Parallel Processing using 'mirai'
https://future.mirai.futureverse.org/
22 stars 1 forks source link

CRAN: Fix issue by 2024-07-07 #15

Closed HenrikBengtsson closed 3 months ago

HenrikBengtsson commented 4 months ago

Issue

CRAN checks report on:

...
<FutureError: Failed to retrieve results from MiraiFuture (<none>). The mirai framework reports on error value 19>  

    Future UUID: <NA>
    [1] 2
    Number of workers: 2
    Error: nworkers == all - 1L is not TRUE
    Execution halted

for r-patched-linux-x86_64, cf. https://cran.r-project.org/web/checks/check_results_future.mirai.html.

Troubleshooting

The gist of the failing unit test is to launch two (2) mirai workers. Then it launches a future that terminates the mirai worker process;

https://github.com/HenrikBengtsson/future.mirai/blob/0f68255621c41b5447d01739d77785bec652415c/tests/mirai_cluster%2Cworker-termination.R#L18-L22

The unit test detects that the future fails and validates that a FutureError is thrown. The printed results and the following assertion confirm this happens.

However, after this, the unit test fails in:

https://github.com/HenrikBengtsson/future.mirai/blob/0f68255621c41b5447d01739d77785bec652415c/tests/mirai_cluster%2Cworker-termination.R#L26-L31

We know from before that all is 2. The output shows that nworkers is 2, but we'd expect it to be 1 here.

I still haven't managed to trigger it myself.

Actions

I suspect that there is a race condition here. My latest guess is that the worker process is still running, or at least, mirai still hasn't noticed it is down. It might take some moments before this information is propagated. Based on these best guesses, I've updated the unit test to wait-and-retry for up to 5 seconds before concluding it fails:

https://github.com/HenrikBengtsson/future.mirai/blob/5d8a15948172576b4c334fe0b8265a95bdce2d18/tests/mirai_cluster%2Cworker-termination.R#L28-L38

See also

This is the same failure as: