CrayLabs / SmartSim

SmartSim Infrastructure Library.
BSD 2-Clause "Simplified" License
228 stars 36 forks source link

Reuse `SbatchStep` and `QsubSteps` functions when creating `DragonBatchSteps` #595

Open al-rigazzi opened 4 months ago

al-rigazzi commented 4 months ago

Description

DragonBatchStep should convert the step to a SbatchStep or to a QsubStep and directly use the corresponding write_script methods, instead of re-implementing them.

Justification

The DragonBatchStep has to create either a Sbatch or a Qsub script to be run through the corresponding scheduler. Currently, the functions to create such scripts are implemented in DragonBatchStep, but they are only slight variations of the SbatchStep and QsubStep counterparts. It would be better to convert the step internally to the appropriate batch step, and by doing so, being able to re-use its methods.

Implementation Strategy

The DragonBatchStep needs to create the request file, which is not created by Slurm or PBS batch steps, usually. This needs to happen before the step is converted. Moreover, the internal step (which would usually be a SrunStep on Slurm and one of the equivalent types on PBS), will need to be a LocalStep, used to start the dragon_client.py entrypoint as we currently do. But this LocalStep will have to be proxied - no indirect.py entrypoint should be invoked, as the job will still be managed through Dragon.