Open cleemansen opened 7 months ago
Hi Clemens, thanks for trying out the demo and for the feedback. No worries - to even things out, my experience with Docker is very limited, an my knowledge on parallelization doesn't go much deeper than user-level! :)
So, like you, I have no idea as to why multisession
doesn't work in Docker. Some googling did bring up other people having the same issue (https://github.com/HenrikBengtsson/future?tab=readme-ov-file#controlling-how-futures-are-resolved) and seems like the solution suggested by the {future}
maintainer was to use multicore
(inside some nested structure... way above my head!)
Indeed, like you point out, multicore
is not the ideal solution from a development perspective as it won't work for developers under windows.
I was wondering if you tried out running the app with the cluster
option in your local MoveApps Docker container? I gave it a go in my local RStudio session and it seems to be working fine with the expected gains on ~5:15 (parallel vs sequential).
Perhaps cluster
is our way out?
Thank you!
Regards, Bruno
Hi @cleemansen and @andreakoelzsch,
I'm pleased to inform that parallel processing appears to be working as expected once the future
strategy is set to cluster
!
This is the logger output from running this test app on MoveApps using a random dataset:
2024-03-18 14:26:53.588 INFO : app will be started with configuration:
2024-03-18 14:26:53.590 INFO : {}
2024-03-18 14:26:53.620 INFO : Okay, let's check if parallelization is working!
2024-03-18 14:26:53.705 INFO : Number of cores currently available for parallel processing: 7
2024-03-18 14:27:03.848 INFO : Performing track-level tasks in parallel
2024-03-18 14:27:10.874 INFO : |> Processing track SAV.4355.B..deploy_id.2806115210.
2024-03-18 14:27:11.323 INFO : |> Processing track SAV.4356.A..deploy_id.2790876200.
2024-03-18 14:27:11.773 INFO : |> Processing track SAV.4357.A..deploy_id.2796332882.
2024-03-18 14:27:12.220 INFO : |> Processing track SAV.4358.A..deploy_id.2790879124.
2024-03-18 14:27:12.674 INFO : |> Processing track SAV.4360.A..deploy_id.2601766196.
2024-03-18 14:27:12.715 INFO : Performing track-level tasks sequentially
2024-03-18 14:27:38.344 INFO : |> Processing track SAV.4355.B..deploy_id.2806115210.
2024-03-18 14:27:38.345 INFO : |> Processing track SAV.4356.A..deploy_id.2790876200.
2024-03-18 14:27:38.345 INFO : |> Processing track SAV.4357.A..deploy_id.2796332882.
2024-03-18 14:27:38.345 INFO : |> Processing track SAV.4358.A..deploy_id.2790879124.
2024-03-18 14:27:38.347 INFO : |> Processing track SAV.4360.A..deploy_id.2601766196.
2024-03-18 14:27:38.347 INFO : Runtime:
2024-03-18 14:27:38.347 INFO : - Parallel Processing: 8.817secs
2024-03-18 14:27:38.348 INFO : - Sequential Processing: 25.629secs
The cool thing is that this reproduces the runtimes observed when running the app locally, i.e. there is no divergence between the developer's experience and the MoveApps behavior.
So, unless Clemens sees any objection to the use of the cluster
strategy, or any potential problem that may arise from its use inside the MoveAps environment, I think we'll be able to release a proper app (the Vulture Behaviour Classification) with parallelization processing fairly soon!
Thanks again for looking into this and for the useful feedback.
PS.: Not wanting to step on your toes, but perhaps it would be useful to add a suggestion on the "developer_README.md" in the template app about using future::plan("cluster")
as an option to set up parallel evaluation in apps? I personally tend to use {furrr}
to set my parallel tasks, but other packages (e.g. {foreach} or {doFuture}) also work in tandem with the 'future' framework.
Hello @bcaneco - thanks for going the next step!
I can confirm that the strategy cluster
looks very good. From my point of view I don't see any problems using this strategy ("cluster
: external R sessions on current, local, and/or remote machines"). I did:
Everywhere the ratio was as expected :)
A few further notes:
future::availableCores(omit = 1)
. Therefor setting the strategy should be simply future::plan("cluster")
. Of course printing the available cores was pretty useful in this demo!developer_README.md
is a good place. You are more than welcome to propose a little snippet :)Thanks for all your effort and I wish you good luck with your next app..
Thanks so much for getting this rolling and how cool that it work, @bcaneco ! Sure we will add a section to the user manual, lets see about the developer_README.md, thanks for the suggestion. Good luck with your new App :)
PS. maybe it is worth looking into the cluster
option during the next sprint @cleemansen . Just to be complete...
Hello Bruno,
I'm a developer of MoveApps. I looked into your provided demo app about parallel execution and come back to you with my findings.
First of all:
I could reproduce your submitted behaviour:
4:9
ratio (parallel vs sequential) [OK]24:10
[NOK]After some reading I changed the parallel plan from
future::plan(multisession)
tofuture::plan(multicore)
9:9
[NOK - but expected asmultisession
is not supported in RStudio]4:10
[OK]Do you have any thoughts about that? I don't have any clue why
mutlisession
does not work in Docker. Of course switching tomulticore
is not optimal as the development experience lacks - but for now this is my best advise.Best
Clemens