GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
190 stars 24 forks source link

How to use bambu in parellel on HPC clusters? #428

Open dudududu12138 opened 6 months ago

dudududu12138 commented 6 months ago

Hi, I used bambu on one sample. The alignment result(.bam) of this sample is about 34G. And I run bambu on HPC cluster. I set the cpu number to 20. Then run it. But it took 6 hours to finish the job. And the cpu efficiency is only 5%. That is , only 1 cpu was used by bambu. I am so confused. Can you help me? Thanks! Below is my R code:

#! /usr/bin/env Rscript
library(bambu)
args<- commandArgs(trailingOnly=TRUE)
rawreads<-args[1]
ref_anno<-args[2]
ref<-args[3]
core<-args[4]
output=args[5]

bambuAnnotations <- prepareAnnotations(ref_anno)
se<-bambu(reads=rawreads, annotations=bambuAnnotations, genome=ref, ncore=core, trackReads=TRUE)
writeBambuOutput(se,path=output)

Below is my slurm code (submit my R script to the HPC cluster):

#!/bin/bash
#SBATCH -J bambu
#SBATCH --partition=cpu
#SBATCH -n 20
#SBATCH --output=%j.out
#SBATCH --error=%j.err

module load miniconda3
source activate R_422

ref=~/reference/GRCh38.p13.genome.fa
anno=~/reference/gencode.v44.chr_patch_hapl_scaff.annotation.gtf
reads=my.bam
output=../bambu/
core=20

./bambu.R $reads $anno $ref $core $output  
andredsim commented 4 months ago

Hi, Sorry I didn't see this issue earlier. It shouldn't take 6 hours to run a 34G bam file. Could you check the class() of core as I think it might be being read in as a character and not an integer which might be interpreted as 1 core and not 20. Adding as.interger(core) should help I hope. Kind Regards, Andre Sim