LLNL / merlin

Machine Learning for HPC Workflows
MIT License
119 stars 26 forks source link

[FEAT] Launch workers with flux if it is the main scheduler #384

Closed koning closed 1 year ago

koning commented 1 year ago

🚀 Feature Request

What problem is this feature looking to solve? Several HPC clusters have flux as the main scheduler, the current system assumes SLURM is launching the flux instance. The system should check if flux or slurm is the main scheduler for launching workers.

Describe the solution you'd like Here is a function that will check if flux is the main scheduler:

    def check_for_flux(self):
        p = subprocess.Popen(
            ["flux", "resource", "info"],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )

        result = p.stdout.readlines()
        if result and len(result) > 0 and b"Nodes" in result[0]:
            return True
        else:
            return False
lucpeterson commented 1 year ago

I think as we do this we should also put in a framework to eventually add the hooks for other potential schedulers, eg PBS

koning commented 1 year ago

Yes, there should be external configuration like the server option. Then users can easily add schedulers.