PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
247 stars 43 forks source link

Lima execution seems unexpectedly slow on MASseq run #649

Closed cyrilcros closed 7 months ago

cyrilcros commented 7 months ago

Operating system Nextflow 23.10.11.5893 on a SLURM cluster, ${task.cpus}=128 cpu / 128GB RAM resource request which lands on nodes with a single Epyc9754.

Package name / Conda environment Lima 2.9.0 - I use conda ='bioconda::lima=2.9.0' in my nextflow.config so the environment will really just have that.

Describe the bug Lima with the --isoseq mode looks like it processes 40,000 reads /min The reported speed in the lima documentation is much faster, and I am using a high core count / good CPU. htop shows many suspended threads and a single one really running. I also feel this was not the case in previous versions? I am exceeding the wall time limit when my previous runs on the same data in December were finishing faster.

lima hifi_reads_skera.bam primers.fasta hifi_reads.fl.bam --isoseq --log-level INFO --num-threads ${task.cpus}

Error message No error message

To Reproduce See https://isoseq.how/umi/tertiary-analysis.html, demo data is present at https://downloads.pacbcloud.com/public/dataset/MAS-Seq/

Expected behavior Lima properly uses all the cores it has.

armintoepfer commented 7 months ago

Maybe you are limited by IO?

You can also try to set this export PB_BAMREADER_THREADS=12

cyrilcros commented 7 months ago

Thanks for the quick reply. I will check that when my runs end, but it really feels like a regression was introduced. On the exact same data my run has been running for now two hours on 5x times more cores when it was done in 15-20min before. I kept the pipeline code in version control and I still have the Nextflow reports around. Around November 23rd, I was running with conda ='bioconda::lima=2.7.1' / 24 cpus / a 1 hour walltime limit. The average CPU usage was 40%, right now I am nowhere near that. I have bumped the resources and time limit now and will see how long it takes. I am still using the same /scratch system on our cluster which is NVMe storage drives in a BeeGFS cluster filesystem.

armintoepfer commented 7 months ago

It scales linear in the number of barcodes, but I really suspect that it's your system. On the Revio instrument I can demux 1M HiFi reads per minute with 96 barcodes and lima 2.9.0.

Example log:

Processed : 8715784
Throughput: 1083976/min
Run Time  : 8m 12s
CPU Time  : 2h 31m
Peak RSS  : 3.72195 GB
cyrilcros commented 7 months ago

Hi, I think you are absolutely right. We added some new nodes to our cluster in December where my jobs are mostly landing. I am also seeing a slow down when checking out the workflow version I used back then.