Support re-incarnation of main process as a batch job

annawoodard commented 6 years ago

This is a problem for long-running workflows on some systems where the headnode may be restarted.

We should consider supporting a mode where you start a parsl run in the usual way, but indicate that you want the main process to be submitted as a batch job so it is persistent. That way you don’t have to literally write your own submit script to submit the job, you just tell Parsl “kill yourself and re-incarnate as a batch job.”

annawoodard commented 6 years ago

(some form of this requested by @benclifford)

annawoodard commented 6 years ago

Note this is an issue for the DESC workflow as well as the TBI connectome project.

benclifford commented 6 years ago

There is some use case in not having parsl involved in the submission at all; the situation there might be:

I know how to use this HPC resource already.

I don't want to have to learn some new job submission language (parsl config language) to do what I already know, especially when I can do things with the HPC resource that I can't express with the parsl config language (for example, chaining jobs). Writing my own submit scripts is a feature not a problem.

I want to use parsl to do new, better things, not do what I was doing already.

(and in this form, this was how we wanted it for the DESC workflow in practice).

benclifford commented 6 years ago

I prototyped this behaviour on cori:

Submit a batch job using sbatch that looks like this, which will submit a 5 node job that at startup runs parsl (in a.py) on a single node, leaving 4 unused.

#!/bin/bash
#SBATCH -q debug
#SBATCH -N 5
#SBATCH -C haswell
cd /global/homes/b/bxc/run201811
source env.source
srun -N 1 ./a.py

Then a.py contains a parsl workflow, configured with this executor:

cori_in_job_executor = HighThroughputExecutor(
            label='worker-nodes',
            public_ip=address_by_hostname(),
            worker_debug=True,
            provider=LocalProvider(
                nodes_per_block=4,
                tasks_per_node=1,
                init_blocks=1,
                min_blocks=1,
                max_blocks=1,
                launcher=SrunLauncher(),
            ),
        )

This will srun the highthroughput executor within the job, on the remaining 4 nodes. Note the configuration of 4 explicitly here, which must match up (be one less than) the 5 nodes specified in the sbatch script.

The two slightly unusual features are:

being able to srun multiple sets of processes (2 in this case) within the same batch allocation. That is explicitly supported on cori.
being able to srun a set of processes from within an already running srun-launched process

benclifford commented 5 years ago

Another use for this is running parsl inside an interactive job on cori - you get your nodes allocated with salloc and then you can run parsl inside that allocation.

villarrealas commented 5 years ago

So putting my user hat with no idea of how hard this would be to implement...

Both at NERSC (Cori) and ALCF (Theta), we've had issues with the parsl-driver needing to run on the log-in node and needing to manually reconnect things whenever the resource goes down. This is especially true of Theta, where we have maintenance every other Monday and would like to take advantage of the score boost we get for having a large job in the queue before that.

So ideally we would like a system by which the parsl-driver would always be operating on the log-in node side and can be restarted if it is not detected by a running job, continuing where it left off. While running inside the allocation is helpful, I can also think of cases where we might want to have many small jobs (say many 512 node jobs on Theta) and would need to send new work to them as work empties out. As such, there remains an advantage to a log-in node side driver.

Parsl / parsl

Support re-incarnation of main process as a batch job #643