VistaSohrab / TEfinder

A bioinformatics tool for detecting novel transposable element insertions
7 stars 5 forks source link

Picard running out of memory #3

Open alyazeeditalal opened 2 years ago

alyazeeditalal commented 2 years ago

I am running TEfinder to find insertions in a large bam file 18GB however the analysis pipeline stops at the Picard step with a message Runtime.totalMemory()=2097152000. I am running the analysis in HPC and installed all dependencies using conda. I had to change picard command in the TEfinder file pipline because I can run Picard from the shell with out having to run java first, by only typing "picard".

picard -Xmx2000m FilterSamReads I=${workingdir}/${outname}Alignments.bam O=${currdir}/${line}_DiscordantPairs.bam \ READ_LIST_FILE=${currdir}/${line}_ReadID.txt FILTER=includeReadList WRITE_READS_FILES=false echo -e $(date) " Filtering original alignment based on discordant reads IDs is complete for "${line}"\n" ####

The analysis stops at the picard step INFO 2021-12-10 15:42:43 FilterSamReads 2 SAMRecords written to Motif:LTR_retrotransposon_161_DiscordantPairs.bam [Fri Dec 10 15:42:43 GMT 2021] picard.sam.FilterSamReads done. Elapsed time: 6.93 minutes. Runtime.totalMemory()=2097152000

I tried to downsample the bam file to a smaller size bam by selecting only 25% coverage. Though, exactly the same issue where the pipeline stopped at the Picard step.

What are your suggestions?

Kind regards,

VistaSohrab commented 2 years ago

Picard is unfortunately memory intensive. I'm wondering whether increasing the max heap memory from the default value of 2000 megabytes or 2 gigabytes (-Xmx2000m) to a higher value depending on the total requested memory in the HPC would help.