Closed eirrgang closed 1 year ago
Worker._dispatch_proc()
is broken in RP 1.21.0, so a lateral move is not possible using the TASK_PROC mode. I will discuss options with @andre-merzky and @mturilli today.
update
Pending further discussion (#302), we leave it as an exercise to the user to provision a Pilot that is adequate for the tasks to be submitted. Dispatching through raptor has a slight additional burden and warrants some updates to the scalems raptor lifetime management.
By the time the scalems.call.function_call_to_subprocess()
call is made, the Worker(s) may have already started. By the time scalems.radical.runtime.subprocess_to_rp_task()
executes, the Worker(s) has definitely started. We need to split up the Worker launch from the Master launch and inspect the work load to decide how to provision the Worker(s).
As a first step, though, to facilitate the lateral move of scalems.call
, we can provision one Worker with N-1 cores and raise an error if the submitted Task is incompatible.
The follow-up should rely on the new raptor protocol that @andre-merzky is working on, if at all possible, to manage Workers, or @eirrgang will be performing completely redundant work that is immediately obsolete.
The biggest short-term impact will be lack of flexibility with cores allocated to (OpenMP) threads versus ranks.
See also #302
Resource constraints:
Other set-up details:
scalems.call
was a workaround that wrapped a serialized function call into a command line executable task for dispatching through traditional RP executable Task execution. This was pursued to give us a chance to move forward with other development while refining raptor.
There does not appear to be a good way to simply port scalems.call
to raptor. We don't have to disable scalems.call
completely, but we cannot simply dispatch the same workflow script to be executed on raptor. The function_call_to_subprocess()
sequence of calls just don't make sense in the raptor context.
update: We should be able to salvage this with TASK_EXECUTABLE mode. The raptor master should be able to manage such a task without a worker.
This issue tracks the
scalems
package aspect of an issue in theworkshop
repository.Delay launch of the Worker (until inspecting the work load) so that cores are available for the TASK_EXECUTABLE task.deferred