Closed keiranmraine closed 5 years ago
Code wise I think adding the threads
option to the pysam readers would help greatly. As it stands a new thread has to wait on the generation of the next set of reads from the parent process making this the rate limiting step. I had this issue in a similar piece of code.
I think it's safe to set threads=2
for AlignmentFile/Samfile when args.threads>1
.
Hi,
The limiting factor is the read performance and the single thread performance. We need to read both BAM files line by line to find the correct corresponding forward and reverse reads. However, there is in our knowledge no way to make this sequential process to run in parallel. Even if it is possible to somehow get iterators to read different chunks of the file, the input bandwidth is still the same and needs now to be shared. These considerations are the reasons for the current implementation.
To improve the performance we recommend to increase the buffer size, but this comes with higher memory usage. However, this helps only if the the input parent threads is able to read the input faster as the n worker threads have done their processing. If the workers are faster, the parallel performance will not improve. In my experience 4 - 6 threads give you an improvement of a factor of 3 compared to the single core version.
What can help too is to not set --outBam
parameter if the resulting BAM file is not needed.
To set the --region
parameter will not give you a performance increase. Still, all lines need to be read given the unordered nature of our BAM files. It is not possible to order it because the information which forward and reverse read are associated would be lost. However, if you want to try this this will give you only the intra-chromosomal reads and no inter-chromosomal reads. Unfortunately, we do not offer a tool to merge these kind of matrices. hicSumMatrices
is only able to merge matrices of the same size (which you don't have).
Concerning threads=2
for pysam.Samfile
, I will test this and inform you how well that works. Thanks for this tip.
Best,
Joachim
I'm finding it takes a considerable time to process our input BAMs (from BWA, generated as described here) with the recommended threads (8) against a high performance file system. I've also noted that the used cpu time factor is only ~1.3 rather than close to 8 as would be expected:
Each input is ~65GB (reads/file ~618 million). I'm also creating a version will all secondary/supplementary reads removed which will reduce the raw reads by ~20%.
What is the best way to improve throughput?
--region
parameter be used to generate a matrix per chromosome for merging that is comparable to running it over the whole input? If so, which tool can merge this information