GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
182 stars 22 forks source link

Question about using yieldSize and ncore parameters together #370

Closed bernardo-heberle closed 1 year ago

bernardo-heberle commented 1 year ago

Hello,

I am trying to optimize the speed vs memory usage for Bambu and I was wondering about using the yieldSize parameter in combination with ncore. Do parallel processes spawned with ncore each use one yieldSize or do they use the yieldSize in combination.

For example: If I set yieldSize=1e6 and ncore=12, will each core load 1e6 reads from the bam file at a time, or do all the 12 cores work on the same 1e6 reads at one time?

andredsim commented 1 year ago

Hi,

Its been a while since I looked at this part of the code, so I double checked and in your example each core will load 1e6 reads at a time from a bam file per core (12*1e6 reads). The ncore parameter distributes the different bam files to different cores, so read class generation will also proceed with a bam file per core.

Kind Regards, Andre Sim

bernardo-heberle commented 1 year ago

Ok, thank you!

I was getting out of memory errors when increasing the number of cores while keeping yieldSize constant. Now it makes perfect sense, will be easy to adapt the code to account for this.

Kind Regards, Bernardo