Open garlick opened 8 years ago
In an effort to help our users transition to using LSF on our CORAL systems, I have created a translation guide that compares the options for submitting jobs to batch schedulers that LC currently supports or has supported in the past. While this is not directly relevant to Flux development, it should serve as a good reference as we work to build out Flux functionality to replace SLURM.
Cool, nice work @lipari.
Great! bsub
doesn't have a way to specify the number of nodes? Do you want to include the options for burst buffer request? As users may want to use burst buffers for their checkpoint and restart feels? E.g., sbatch
now has --bb.
What will be the corresponding LSF option(s)?
The doc is somewhat a work in progress from the LSF side. I forwarded a copy to our LSF contacts and asked them to help add their expertise to making the LSF content more accurate and current. So, specifying BBs and GPUs in LSF will be forthcoming.
As far as specifying nodes go, no, bsub does not have a direct analog to requesting nodes. They have a default slot definition of a core, and specifying tasks gets you that many cores - regardless of the nodes allocated to the job. There is a way to alter the default slot def, but I held off adding too much complexity to the table - to keep the minutia from clouding the message.
A fresh look at these requirements was added as https://github.com/flux-framework/distribution/issues/18.
Smallest Serviceable Slurm Substitute
What follows are the requirements to replace the SLURM version currently in use at LC, not a wish list for the perfect batch system. The requirements are listed as bullet items with minimal text to describe the item. This assumes an understanding of SLURM and its features. For further details, reference the SLURM man pages. References to SLURM commands are listed where appropriate. New features in the versions of SLURM beyond v2.3.3 are not listed.
sinfo
)scontrol show node
)scontrol show partition
)sbatch
)salloc
)mxterm
/sxterm
)srun
)(Pound) directive support in batch script (e.g., #SBATCH -N) as optional means to convey job specifications
squeue
)scontrol show job
)scancel
)scontrol update job
)sreport
)sacct
)sreport
)sacctmgr
)libyogrt
)