Closed eirrgang closed 1 year ago
The rp_ssh
job failed at https://github.com/SCALE-MS/scale-ms/actions/runs/4628011576/jobs/8186580992 because the task input staging effectively put the input file at pilot:///test_rp_function-1/test_rp_function-1-input.json
, and the task did not run in that directory. (Presumably it ran in pilot:///scalems-rp-master.71b92ac0-d467-11ed-b924-778ad015143a
, but can we confirm?)
(Presumably it ran in
pilot:///scalems-rp-master.71b92ac0-d467-11ed-b924-778ad015143a
, but can we confirm?)
Confirmed with additional logging
It turns out that the approach in this pull request (mode rp.TASK_PROC) cannot support MPI-aware tasks.
Support a lateral move of
scalems.call
dispatching to the raptor execution system. This is far from optimal (it executes a deserialized function in a command line subprocesses via the raptor PROC mode), but illustrates a minimal change to ensure that we don't lose any functionality by migrating back to raptor.Immediate follow-up work needs to remove the scalems.call.cli layer and use native raptor function execution.
Ref #326