Memory error while sorting bam

amdixit commented 3 years ago

When I run the pipeline i received the following error:

b'\nEXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc\nPossible cause 1: not enough RAM. Check if you have enough RAM 108847334315717446 bytes\nPossible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 108847334315717446\n\nNov 10 03:07:12 ......FATAL ERROR, exiting\n'

I try to limit the memory used by passing the argument star-sort-mem-limit like below but it did not help either! Please assist

st_pipeline_run.py \ --output-folder $OUTPUT \ --ids $ID \ --ref-map $MAP \ --ref-annotation $ANN \ --expName $sample \ --htseq-no-ambiguous \ --verbose \ --log-file $OUTPUT/${sample}_log.txt \ --allowed-kmer 5 \ --mapping-threads 20 \ --temp-folder $TMP_ST \ --no-clean-up \ --umi-start-position 16 \ --umi-end-position 26 \ --star-sort-mem-limit 70166447416 \ --overhang 0 \ --min-length-qual-trimming 20 \

amdixit commented 3 years ago

full error:

NFO:STPipeline:ST Pipeline 1.7.9 INFO:STPipeline:Output directory: /usr/bin/dbit/output INFO:STPipeline:Temporary directory: /usr/bin/dbit/output/tmp INFO:STPipeline:Dataset name: ME10_C50 INFO:STPipeline:Forward(R1) input file: /tmp/ME10_C50_R2_processed.fastq INFO:STPipeline:Reverse(R2) input file: /tmp/ME10_C50_R1_filtered.fastq.gz INFO:STPipeline:Reference mapping STAR index folder: /usr/bin/dbit/Scripts/st_references/mm10 INFO:STPipeline:Reference annotation file: /usr/bin/dbit/Scripts/st_references/mm10/mm10.gtf INFO:STPipeline:CPU Nodes: 20 INFO:STPipeline:Ids(barcodes) file: /usr/bin/dbit/spatial_barcodes/spatial_barcodes.txt INFO:STPipeline:TaggD allowed mismatches: 2 INFO:STPipeline:TaggD kmer size: 5 INFO:STPipeline:TaggD overhang: 0 INFO:STPipeline:TaggD metric: Subglobal INFO:STPipeline:Mapping reverse trimming: 0 INFO:STPipeline:Mapping inverse reverse trimming: 0 INFO:STPipeline:Mapping tool: STAR INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:STAR genome loading strategy NoSharedMemory INFO:STPipeline:Annotation tool: HTSeq INFO:STPipeline:Annotation mode: intersection-nonempty INFO:STPipeline:Annotation strandness yes INFO:STPipeline:UMIs start position: 16 INFO:STPipeline:UMIs end position: 26 INFO:STPipeline:UMIs allowed mismatches: 1 INFO:STPipeline:UMIs clustering algorithm: AdjacentBi INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot INFO:STPipeline:Allowing 6 low quality bases in an UMI INFO:STPipeline:Discarding reads that after trimming are shorter than 20 INFO:STPipeline:Removing polyA sequences of a length of at least: 10 INFO:STPipeline:Removing polyT sequences of a length of at least: 10 INFO:STPipeline:Removing polyG sequences of a length of at least: 10 INFO:STPipeline:Removing polyC sequences of a length of at least: 10 INFO:STPipeline:Removing polyN sequences of a length of at least: 10 INFO:STPipeline:Allowing 0 mismatches when removing homopolymers INFO:STPipeline:Remove reads whose AT content is 90% INFO:STPipeline:Remove reads whose GC content is 90% INFO:STPipeline:Starting the pipeline: 2020-11-10 07:47:14.874577 INFO:STPipeline:Start filtering raw reads 2020-11-10 07:47:14.881028 (dbit) [root@ip-10-21-10-123 output]# cat ME10_C50_log.txt INFO:STPipeline:ST Pipeline 1.7.9 INFO:STPipeline:Output directory: /usr/bin/dbit/output INFO:STPipeline:Temporary directory: /usr/bin/dbit/output/tmp INFO:STPipeline:Dataset name: ME10_C50 INFO:STPipeline:Forward(R1) input file: /tmp/ME10_C50_R2_processed.fastq INFO:STPipeline:Reverse(R2) input file: /tmp/ME10_C50_R1_filtered.fastq.gz INFO:STPipeline:Reference mapping STAR index folder: /usr/bin/dbit/Scripts/st_references/mm10 INFO:STPipeline:Reference annotation file: /usr/bin/dbit/Scripts/st_references/mm10/mm10.gtf INFO:STPipeline:CPU Nodes: 20 INFO:STPipeline:Ids(barcodes) file: /usr/bin/dbit/spatial_barcodes/spatial_barcodes.txt INFO:STPipeline:TaggD allowed mismatches: 2 INFO:STPipeline:TaggD kmer size: 5 INFO:STPipeline:TaggD overhang: 0 INFO:STPipeline:TaggD metric: Subglobal INFO:STPipeline:Mapping reverse trimming: 0 INFO:STPipeline:Mapping inverse reverse trimming: 0 INFO:STPipeline:Mapping tool: STAR INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:STAR genome loading strategy NoSharedMemory INFO:STPipeline:Annotation tool: HTSeq INFO:STPipeline:Annotation mode: intersection-nonempty INFO:STPipeline:Annotation strandness yes INFO:STPipeline:UMIs start position: 16 INFO:STPipeline:UMIs end position: 26 INFO:STPipeline:UMIs allowed mismatches: 1 INFO:STPipeline:UMIs clustering algorithm: AdjacentBi INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot INFO:STPipeline:Allowing 6 low quality bases in an UMI INFO:STPipeline:Discarding reads that after trimming are shorter than 20 INFO:STPipeline:Removing polyA sequences of a length of at least: 10 INFO:STPipeline:Removing polyT sequences of a length of at least: 10 INFO:STPipeline:Removing polyG sequences of a length of at least: 10 INFO:STPipeline:Removing polyC sequences of a length of at least: 10 INFO:STPipeline:Removing polyN sequences of a length of at least: 10 INFO:STPipeline:Allowing 0 mismatches when removing homopolymers INFO:STPipeline:Remove reads whose AT content is 90% INFO:STPipeline:Remove reads whose GC content is 90% ERROR:STPipeline:STAR INFO:STPipeline:ST Pipeline 1.7.9 INFO:STPipeline:Output directory: /usr/bin/dbit/output INFO:STPipeline:Temporary directory: /usr/bin/dbit/output/tmp INFO:STPipeline:Dataset name: ME10_C50 INFO:STPipeline:Forward(R1) input file: /tmp/ME10_C50_R2_processed.fastq INFO:STPipeline:Reverse(R2) input file: /tmp/ME10_C50_R1_filtered.fastq.gz INFO:STPipeline:Reference mapping STAR index folder: /usr/bin/dbit/Scripts/st_references/mm10 INFO:STPipeline:Reference annotation file: /usr/bin/dbit/Scripts/st_references/mm10/mm10.gtf INFO:STPipeline:CPU Nodes: 20 INFO:STPipeline:Ids(barcodes) file: /usr/bin/dbit/spatial_barcodes/spatial_barcodes.txt INFO:STPipeline:TaggD allowed mismatches: 2 INFO:STPipeline:TaggD kmer size: 5 INFO:STPipeline:TaggD overhang: 0 INFO:STPipeline:TaggD metric: Subglobal INFO:STPipeline:Mapping reverse trimming: 0 INFO:STPipeline:Mapping inverse reverse trimming: 0 INFO:STPipeline:Mapping tool: STAR INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:STAR genome loading strategy NoSharedMemory INFO:STPipeline:Annotation tool: HTSeq INFO:STPipeline:Annotation mode: intersection-nonempty INFO:STPipeline:Annotation strandness yes INFO:STPipeline:UMIs start position: 16 INFO:STPipeline:UMIs end position: 26 INFO:STPipeline:UMIs allowed mismatches: 1 INFO:STPipeline:UMIs clustering algorithm: AdjacentBi INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot INFO:STPipeline:Allowing 6 low quality bases in an UMI INFO:STPipeline:Discarding reads that after trimming are shorter than 20 INFO:STPipeline:Removing polyA sequences of a length of at least: 10 INFO:STPipeline:Removing polyT sequences of a length of at least: 10 INFO:STPipeline:Removing polyG sequences of a length of at least: 10 INFO:STPipeline:Removing polyC sequences of a length of at least: 10 INFO:STPipeline:Removing polyN sequences of a length of at least: 10 INFO:STPipeline:Allowing 0 mismatches when removing homopolymers INFO:STPipeline:Remove reads whose AT content is 90% INFO:STPipeline:Remove reads whose GC content is 90% INFO:STPipeline:Starting the pipeline: 2020-11-10 07:47:14.874577 INFO:STPipeline:Start filtering raw reads 2020-11-10 07:47:14.881028 INFO:STPipeline:Trimming stats total reads (pair): 62992725 INFO:STPipeline:Trimming stats 1978584 reads have been dropped! INFO:STPipeline:Trimming stats you just lost about 3.14% of your data INFO:STPipeline:Trimming stats reads remaining: 61014141 INFO:STPipeline:Trimming stats dropped pairs due to incorrect UMI: 0 INFO:STPipeline:Trimming stats dropped pairs due to low quality UMI: 627057 INFO:STPipeline:Trimming stats dropped pairs due to high AT content: 573901 INFO:STPipeline:Trimming stats dropped pairs due to high GC content: 140 INFO:STPipeline:Trimming stats dropped pairs due to presence of artifacts: 758274 INFO:STPipeline:Trimming stats dropped pairs due to being too short: 19212 INFO:STPipeline:Starting genome alignment 2020-11-10 08:05:30.468736 ERROR:STPipeline:Error mapping with STAR. Output file not present /usr/bin/dbit/output/tmp/Aligned.sortedByCoord.out.bam b'\nEXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc\nPossible cause 1: not enough RAM. Check if you have enough RAM 108847334315717446 bytes\nPossible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 108847334315717446\n\nNov 10 08:05:30 ......FATAL ERROR, exiting\n'

jfnavarro commented 3 years ago

Hello,

How much RAM memory do you have available in the machine that you are using?

Best, Jose

amdixit commented 3 years ago

Turns out we had a corruption of genome files. The issue resolved after we fixed that. Thanks!

SpatialTranscriptomicsResearch / st_pipeline

Memory error while sorting bam #120