ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
523 stars 111 forks source link

auto-scale blast chunk size on slurm #1301

Closed glennhickey closed 8 months ago

glennhickey commented 8 months ago

Right now there's a crucial bit in the README about manually increasing the blast chunk size for cpu jobs:

cp cactus-bin-v2.7.2/src/cactus/cactus_progressive_config.xml ./config-slurm.xml sed -i config-slurm.xml -e 's/blast chunkSize="30000000"/blast chunkSize="90000000"/g' sed -i config-slurm.xml -e 's/dechunkBatchSize="1000"/dechunkBatchSize="200"/g'

This brings down the number of jobs by a factor of 9, which helps to not spam the cluster with too many jobs (ie reduces from 100s of thousands to 10s of thousands). I think slurm itself can handle big queues, but I was seeing some issues in early tests that may be related to the jobstore pushing the network filesystem with so many jobs.

Anyway, this PR changes cactus so that is applied automatically, and controlled by slurmChunkScale in the config XML.