Closed mrvollger closed 1 year ago
Yea, this is an bottleneck we're aware of that's specifically related to writing haplotagged files. The phasing itself is parallelized well, but the writing of files is still handled in a single-threaded manner. If you are not writing BAM files, this isn't really an issues because the file sizes are small, but once you starting haplotagging the tool quickly becomes thread and/or I/O bound. Improving this is on our longer-term TODO list.
Thanks for the info!
This might not be helpful but I have found that up to 8-16 threads setting this option can really speed things up!
// stuff reading in a bam file and a header from that bam
// ...
let threads = 16;
let mut out = bam::Writer::from_path(out, &header, bam::Format::Bam).unwrap()
out.set_threads(threads).unwrap();
this of course assumes you use rust, rust-htslib, etc.
But when I use this I can write >10,000 pacbio reads per second.
Can confirm that it is much faster without the bam output file. But FYI I am still not seeing great utilization for all 32 threads.
I'm not entirely sure what I'm looking at on that top readout. Is the rg
command providing sequential timepoints?
Regardless, there is likely some optimization of threads that can happen around all forms of I/O and parallelization. Most internal tests so far have been on 16 threads, and we have not revisited parallelization components probably since proof-of-concept. Historically, they were not the bottlenecks, but we may need to revisit that if further speed improvements get prioritized.
Ahh sorry. rg is just a grep alternative I like and it's just searching top for updates with hiphase over a minute or so.
But I was able to remove the need for the bam with the new haplotag file you made for me and I am happy with that speed. So feel free to close if you want, or leave open to bookmark potential future improvements.
v0.10.0 leverages the thread pools provided by htslib. This was the lowest hanging fruit in the short term for optimizing I/O. Internally, we saw about a 40% speedup while haplotagging, although mileage will vary there across systems and depending on contention.
Hi @holtjma,
I am seeing that when I give
hiphase
32 threads it only uses 150-300% CPU (see screenshot blow with top and run log). Is this expected? And if not do you have any recommendations? This is the command I am using:Thanks in advance! Mitchell