excessive Cactus run time & slurm job time limits

gbeane commented 3 years ago

I'm trying to run cactus on my SLURM cluster with 6 mammalian genomes, and it's been going for 3+ weeks

One thing I've noticed is all the toil jobs seem to run for 1 hour and hit their walltime limit. I think our cluster has a default walltime of 1 hour, so I'm not sure if these jobs are getting the default or if Cactus itself is specifying a 1 hour walltime limit.

Here is an example of one of the stderr files from my work directory:

[2021-07-07T01:43:01-0400] [MainThread] [I] [toil.worker] Redirecting logging to /fastscratch/gbeane/work/node-fa1e9249-b14a-445d-a154-7d3aebe233e7-7e161d63-20af-4909-a67b-ceb9ea026720/tmp56olxtf4/worker_log.txt
slurmstepd: error: *** JOB 9303076 ON sumner021 CANCELLED AT 2021-07-07T02:43:23 DUE TO TIME LIMIT ***

I'm not sure if all of these job cancellations are the cause of what feels like an excessive run time (I was told that with a single 64-core node, I could run cactus in approximately (2 * N of mammalian genomes) days).

Is there a way to increase the walltime of the toil jobs to avoid these cancellations?

here is how I am invoking Cactus:

   cactus-prepare-toil \
    --binariesMode singularity \
    --containerImage /projects/compsci/cactus/cactus.img \
    --batchSystem slurm \
    --outHal /projects/compsci/cactus/out.hal \
    --defaultMemory 64G \
    --defaultCores 48 \
    --defaultDisk 10T \
    --disableCaching \
    --workDir $WORKDIR \
    $JOBDIR \
    /projects/compsci/cat/cactus/test_config.txt

glennhickey commented 3 years ago

I don't think there's an interface in Cactus to specify job time. You'll have to make an issue here for that: https://github.com/DataBiosphere/toil/issues

For mammalian genomes, without a GPU, I'd expect each cactus-prepare-toil job to take closer to a week on 64 cores.

gbeane commented 3 years ago

@glennhickey -- Is it a problem that I'm seeing all these toil jobs get canceled after hitting the 1 hour limit?

For mammalian genomes, without a GPU, I'd expect each cactus-prepare-toil job to take closer to a week on 64 cores.

1 week total, or 1 week per genome?

gbeane commented 3 years ago

@glennhickey -- I have access to systems with 4 p100 or 4 v100 gpus. The p100 nodes have 128GB RAM and 24 cores. The v100 nodes have 192GB RAM and 48 cores. Would either of these be sufficient to run the GPU-enabled Cactus?

glennhickey commented 3 years ago

4 v100's is fine, but 192G may not be enough ram for mammalian sized genomes. It's all quite dependent on the assembly quality and divergence between species. In general, more closely related species require far fewer resources.

gbeane commented 3 years ago

@glennhickey approximately how long would you expect it to run using 4 v100s and 48 CPU cores? I started a run that's been going around ~52 hours. It looks like the "cactus_consolidated" job started around ~18 hours ago. I'm trying to figure out roughly how much longer I have to go.

Also, is there a straightforward way to run the GPU portion separately? I'm running on our GPU cluster, but the GPUs have been idle since the cactus_consolidated command started running. Our research IT folks would be much happier with me if I could run the GPU portion on our GPU cluster, and then move over to our CPU cluster for running this part (where I would have more CPU cores and RAM available).

glennhickey commented 3 years ago

There have been some recent bugs that slowed down cactus_consolidated for some inputs that are finally fixed in v2.0.3. I've recently run a fairly difficult test and it never took more than 20 hours on 64 cores.

I've only ever run GPU-enabled Cactus on Cromwell, which lets you speicify GPU requirements for individual jobs. There's no such support yet in Toil that I'm aware of, but I think it may be in the works.

gbeane commented 3 years ago

RE: specifying GPU requirements for individual jobs

unfortunately we have two separate clusters: one for workloads that use GPUs, one for everything else, so it's not just a matter of specifying GPU requirements for individual jobs. I'd actually have to login to a different system and submit the jobs to a different instance of SLURM.

Ideally I could split the work up into two totally separate cactus commands. One that runs everything up to cactus_consolidated, and then another command that runs cactus_consolidated. That way I could log into the gpu cluster and submit a job that runs the preprocessing / seqalign stuff that actually uses the GPU, and then when that finishes I could log into the general purpose cluster and submit a job that runs cactus_consolidated.

gbeane commented 3 years ago

@glennhickey

I started by running cactus with just two of our genomes using one of our GPU nodes and it worked fine, however once I went to three genomes, the cactus_consolidated step ran out of memory (4 v100, 48 cores, 192GB RAM)

I tried restarting the job on a non-GPU node with 512GB RAM and the cactus_consolidated step finished, but unfortunately ProgressiveNext failed with a file not found exception. When I looked at the file it was trying to open in the jobStore, it was a broken symlink to a file in /tmp :/

I'm wondering if this is because I restarted the run on a different node, and /tmp is not shared, however I specified a shared filesystem for --workDir so I didn't expect it to use /tmp

ComparativeGenomicsToolkit / cactus

excessive Cactus run time & slurm job time limits #543