Oshlack / JAFFA

JAFFA is a multi-step pipeline that takes either raw RNA-Seq reads, or pre-assembled transcripts, then searches for gene fusions
https://github.com/Oshlack/JAFFA/wiki
Other
87 stars 21 forks source link

resource usage to improve performance #65

Closed anoronh4 closed 2 years ago

anoronh4 commented 3 years ago

I tried to run JAFFA on some of our larger fastqs (8 Gb Read1 + 8 Gb Read2 ), which timed out after 12 hours. I can raise the time limit with no problem, but i also noticed that the average and maximum memory used by the job is 1 Gb and 2 Gb respectively, even though they were scheduled on our cluster with ~40 Gb. Is there any way to increase the memory and cpu consumed by JAFFA to improve performance?

nadiadavidson commented 3 years ago

Hi, which version of JAFFA are you running? We made some major improvements to resource usage in version 2. So if you are using an older version I would suggest updating. If you are running version 2, you can use multiple threads with the parameter -n passed to bpipe and it should run faster. Can you paste the output of the pipeline just prior to it hitting the 12 hour walltime, so I can see how far it got?

Cheers, Nadia

anoronh4 commented 3 years ago

Hello Nadia, We are using version 1.09, as we need to use hg19, and according to #56 this reference is supported in version 2 yet. We can afford to wait for v2 but were excited to assess asap. This is the last line of commandlog.txt:

java -jar /usr/share/java/trimmomatic.jar PE -threads 16 -phred33 Acral-RNA_IGO_10848_C_1_S3_R1_001.fastq.gz Acral-RNA_IGO_10848_C_1_S3_R2_001.fastq.gz /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/tempp1.fq /dev/null /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/tempp2.fq /dev/null LEADING:0 TRAILING:0 MINLEN:35; function fix_ids { cat $1 | awk -v app=$2                         'BEGIN{ i=0 }{                         if(i==0) print $1 "/" app ;                         else print $1 ;                         i++ ;                         if(i==4) i=0 }'                     2>/dev/null                 ; } ;                 fix_ids /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/tempp1.fq 1 > /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/Acral-RNA_trim1.fastq ;                 fix_ids /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/tempp2.fq 2 > /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/Acral-RNA_trim2.fastq ;                 rm /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/tempp1.fq /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/tempp2.fq ;                  bowtie2 -k1 --no-mixed --no-discordant --mm --very-fast                     --al-conc-gz Acral-RNA/Acral-RNA_filtered_reads.fastq.gz                     --un-conc /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/temp_trans_unmap_reads.fastq                     -p 16 -x jaffa_gencode/hg19_genCode19                     -1 /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/Acral-RNA_trim1.fastq                     -2 /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/Acral-RNA_trim2.fastq                     -S /dev/null ;                 bowtie2 -k1 --no-mixed --no-discordant --mm --very-fast                     --un-conc-gz Acral-RNA/Acral-RNA_leftover_reads.fastq.gz                     -p 16 -x jaffa_gencode/Masked_hg19                     -1 /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/temp_trans_unmap_reads.1.fastq                     -2 /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/temp_trans_unmap_reads.2.fastq                     -S /dev/null ;                 cat Acral-RNA/Acral-RNA_leftover_reads.fastq.1.gz >> Acral-RNA/Acral-RNA_filtered_reads.fastq.1.gz ;                 cat Acral-RNA/Acral-RNA_leftover_reads.fastq.2.gz >> Acral-RNA/Acral-RNA_filtered_reads.fastq.2.gz ;                 rm  /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/temp_trans_unmap_reads.1.fastq                     /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/temp_trans_unmap_reads.2.fastq                     /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/Acral-RNA_trim1.fastq                     /workdir/rnafusion_folder/rna_pipeline/test_jaffa/work/e8/45a55726df3054006cdee3cc4e188a/Acral-RNA/Acral-RNA_trim2.fastq ;

i didn't capture any stderr or stdout but the output folder shows the latest files created:

total 162G
-rw-r--r-- 1 user grouplab 676M Sep 14 16:55 Acral-RNA_filtered_reads.fastq.1.gz
-rw-r--r-- 1 user grouplab 694M Sep 14 16:55 Acral-RNA_filtered_reads.fastq.2.gz
-rw-r--r-- 1 user grouplab 706M Sep 14 18:52 Acral-RNA_leftover_reads.fastq.1.gz
-rw-r--r-- 1 user grouplab 759M Sep 14 18:52 Acral-RNA_leftover_reads.fastq.2.gz
-rw-r--r-- 1 user grouplab  42G Sep 14 07:19 Acral-RNA_trim1.fastq
-rw-r--r-- 1 user grouplab  42G Sep 14 07:25 Acral-RNA_trim2.fastq
-rw-r--r-- 1 user grouplab  38G Sep 14 16:55 temp_trans_unmap_reads.1.fastq
-rw-r--r-- 1 user grouplab  38G Sep 14 16:55 temp_trans_unmap_reads.2.fastq

Best regards, Anne Marie

nadiadavidson commented 3 years ago

Hi Anne Marie,

I've just written up some instruction on how to get JAFFA version 2+ working with hg19 in case you'd like to try that? https://github.com/Oshlack/JAFFA/wiki/FAQandTroubleshooting#how-can-i-run-jaffa-with-hg19-or-mm10

It looks like your dataset if fairly large, so it may still take some time to complete with version 2. But I'm happy for you to report back with how it goes.

Cheers, Nadia.

anoronh4 commented 3 years ago

Awesome, thank you! I'll try it out...

anoronh4 commented 3 years ago

Hello Nadia,

I am having some issues running out of heap space. I either get the message java.lang.OutOfMemoryError: Java heap space or Exception in thread "main" java.lang.OutOfMemoryError: Java heap space for most of the runs. an example command is as follows:

bpipe run \
    -n 4 \
    -m 24GB \
    -p genome=${genome_param} \
    -p refBase=jaffa_gencode \
    -p annotation=${annot_param} \
    -p knownTable=${knownTab} \
    -p fastaBase=gencode \
    /opt/JAFFA/JAFFA_direct.groovy \
    sample1_R1_001.fastq.gz sample1_R2_001.fastq.gz

and for this example i have scheduled the job on our cluster with 4 cpu and 32 GB. using 75% of allocated memory has been a "sweet spot" for me with other java applications, but i have never used bpipe. just wondering if you have any experience with this issue.

nadiadavidson commented 3 years ago

Hi, This looks like an issue related to bpipe. @ssadedin , do you know if/how the java heap space should be increased to avoid this error? Cheers, Nadia.