SpatialTranscriptomicsResearch / st_pipeline

ST Pipeline contains the tools and scripts needed to process and analyze the raw files generated with the Spatial Transcriptomics method in FASTQ format.
Other
76 stars 31 forks source link

Memory error while sorting bam #120

Closed amdixit closed 3 years ago

amdixit commented 3 years ago

When I run the pipeline i received the following error:

b'\nEXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc\nPossible cause 1: not enough RAM. Check if you have enough RAM 108847334315717446 bytes\nPossible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 108847334315717446\n\nNov 10 03:07:12 ......FATAL ERROR, exiting\n'

I try to limit the memory used by passing the argument star-sort-mem-limit like below but it did not help either! Please assist

st_pipeline_run.py \ --output-folder $OUTPUT \ --ids $ID \ --ref-map $MAP \ --ref-annotation $ANN \ --expName $sample \ --htseq-no-ambiguous \ --verbose \ --log-file $OUTPUT/${sample}_log.txt \ --allowed-kmer 5 \ --mapping-threads 20 \ --temp-folder $TMP_ST \ --no-clean-up \ --umi-start-position 16 \ --umi-end-position 26 \ --star-sort-mem-limit 70166447416 \ --overhang 0 \ --min-length-qual-trimming 20 \

amdixit commented 3 years ago

full error:

NFO:STPipeline:ST Pipeline 1.7.9 INFO:STPipeline:Output directory: /usr/bin/dbit/output INFO:STPipeline:Temporary directory: /usr/bin/dbit/output/tmp INFO:STPipeline:Dataset name: ME10_C50 INFO:STPipeline:Forward(R1) input file: /tmp/ME10_C50_R2_processed.fastq INFO:STPipeline:Reverse(R2) input file: /tmp/ME10_C50_R1_filtered.fastq.gz INFO:STPipeline:Reference mapping STAR index folder: /usr/bin/dbit/Scripts/st_references/mm10 INFO:STPipeline:Reference annotation file: /usr/bin/dbit/Scripts/st_references/mm10/mm10.gtf INFO:STPipeline:CPU Nodes: 20 INFO:STPipeline:Ids(barcodes) file: /usr/bin/dbit/spatial_barcodes/spatial_barcodes.txt INFO:STPipeline:TaggD allowed mismatches: 2 INFO:STPipeline:TaggD kmer size: 5 INFO:STPipeline:TaggD overhang: 0 INFO:STPipeline:TaggD metric: Subglobal INFO:STPipeline:Mapping reverse trimming: 0 INFO:STPipeline:Mapping inverse reverse trimming: 0 INFO:STPipeline:Mapping tool: STAR INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:STAR genome loading strategy NoSharedMemory INFO:STPipeline:Annotation tool: HTSeq INFO:STPipeline:Annotation mode: intersection-nonempty INFO:STPipeline:Annotation strandness yes INFO:STPipeline:UMIs start position: 16 INFO:STPipeline:UMIs end position: 26 INFO:STPipeline:UMIs allowed mismatches: 1 INFO:STPipeline:UMIs clustering algorithm: AdjacentBi INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot INFO:STPipeline:Allowing 6 low quality bases in an UMI INFO:STPipeline:Discarding reads that after trimming are shorter than 20 INFO:STPipeline:Removing polyA sequences of a length of at least: 10 INFO:STPipeline:Removing polyT sequences of a length of at least: 10 INFO:STPipeline:Removing polyG sequences of a length of at least: 10 INFO:STPipeline:Removing polyC sequences of a length of at least: 10 INFO:STPipeline:Removing polyN sequences of a length of at least: 10 INFO:STPipeline:Allowing 0 mismatches when removing homopolymers INFO:STPipeline:Remove reads whose AT content is 90% INFO:STPipeline:Remove reads whose GC content is 90% INFO:STPipeline:Starting the pipeline: 2020-11-10 07:47:14.874577 INFO:STPipeline:Start filtering raw reads 2020-11-10 07:47:14.881028 (dbit) [root@ip-10-21-10-123 output]# cat ME10_C50_log.txt INFO:STPipeline:ST Pipeline 1.7.9 INFO:STPipeline:Output directory: /usr/bin/dbit/output INFO:STPipeline:Temporary directory: /usr/bin/dbit/output/tmp INFO:STPipeline:Dataset name: ME10_C50 INFO:STPipeline:Forward(R1) input file: /tmp/ME10_C50_R2_processed.fastq INFO:STPipeline:Reverse(R2) input file: /tmp/ME10_C50_R1_filtered.fastq.gz INFO:STPipeline:Reference mapping STAR index folder: /usr/bin/dbit/Scripts/st_references/mm10 INFO:STPipeline:Reference annotation file: /usr/bin/dbit/Scripts/st_references/mm10/mm10.gtf INFO:STPipeline:CPU Nodes: 20 INFO:STPipeline:Ids(barcodes) file: /usr/bin/dbit/spatial_barcodes/spatial_barcodes.txt INFO:STPipeline:TaggD allowed mismatches: 2 INFO:STPipeline:TaggD kmer size: 5 INFO:STPipeline:TaggD overhang: 0 INFO:STPipeline:TaggD metric: Subglobal INFO:STPipeline:Mapping reverse trimming: 0 INFO:STPipeline:Mapping inverse reverse trimming: 0 INFO:STPipeline:Mapping tool: STAR INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:STAR genome loading strategy NoSharedMemory INFO:STPipeline:Annotation tool: HTSeq INFO:STPipeline:Annotation mode: intersection-nonempty INFO:STPipeline:Annotation strandness yes INFO:STPipeline:UMIs start position: 16 INFO:STPipeline:UMIs end position: 26 INFO:STPipeline:UMIs allowed mismatches: 1 INFO:STPipeline:UMIs clustering algorithm: AdjacentBi INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot INFO:STPipeline:Allowing 6 low quality bases in an UMI INFO:STPipeline:Discarding reads that after trimming are shorter than 20 INFO:STPipeline:Removing polyA sequences of a length of at least: 10 INFO:STPipeline:Removing polyT sequences of a length of at least: 10 INFO:STPipeline:Removing polyG sequences of a length of at least: 10 INFO:STPipeline:Removing polyC sequences of a length of at least: 10 INFO:STPipeline:Removing polyN sequences of a length of at least: 10 INFO:STPipeline:Allowing 0 mismatches when removing homopolymers INFO:STPipeline:Remove reads whose AT content is 90% INFO:STPipeline:Remove reads whose GC content is 90% ERROR:STPipeline:STAR INFO:STPipeline:ST Pipeline 1.7.9 INFO:STPipeline:Output directory: /usr/bin/dbit/output INFO:STPipeline:Temporary directory: /usr/bin/dbit/output/tmp INFO:STPipeline:Dataset name: ME10_C50 INFO:STPipeline:Forward(R1) input file: /tmp/ME10_C50_R2_processed.fastq INFO:STPipeline:Reverse(R2) input file: /tmp/ME10_C50_R1_filtered.fastq.gz INFO:STPipeline:Reference mapping STAR index folder: /usr/bin/dbit/Scripts/st_references/mm10 INFO:STPipeline:Reference annotation file: /usr/bin/dbit/Scripts/st_references/mm10/mm10.gtf INFO:STPipeline:CPU Nodes: 20 INFO:STPipeline:Ids(barcodes) file: /usr/bin/dbit/spatial_barcodes/spatial_barcodes.txt INFO:STPipeline:TaggD allowed mismatches: 2 INFO:STPipeline:TaggD kmer size: 5 INFO:STPipeline:TaggD overhang: 0 INFO:STPipeline:TaggD metric: Subglobal INFO:STPipeline:Mapping reverse trimming: 0 INFO:STPipeline:Mapping inverse reverse trimming: 0 INFO:STPipeline:Mapping tool: STAR INFO:STPipeline:Mapping minimum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:Mapping maximum intron size allowed (splice alignments) with STAR: 1 INFO:STPipeline:STAR genome loading strategy NoSharedMemory INFO:STPipeline:Annotation tool: HTSeq INFO:STPipeline:Annotation mode: intersection-nonempty INFO:STPipeline:Annotation strandness yes INFO:STPipeline:UMIs start position: 16 INFO:STPipeline:UMIs end position: 26 INFO:STPipeline:UMIs allowed mismatches: 1 INFO:STPipeline:UMIs clustering algorithm: AdjacentBi INFO:STPipeline:Allowing an offset of 250 when clustering UMIs by strand-start in a gene-spot INFO:STPipeline:Allowing 6 low quality bases in an UMI INFO:STPipeline:Discarding reads that after trimming are shorter than 20 INFO:STPipeline:Removing polyA sequences of a length of at least: 10 INFO:STPipeline:Removing polyT sequences of a length of at least: 10 INFO:STPipeline:Removing polyG sequences of a length of at least: 10 INFO:STPipeline:Removing polyC sequences of a length of at least: 10 INFO:STPipeline:Removing polyN sequences of a length of at least: 10 INFO:STPipeline:Allowing 0 mismatches when removing homopolymers INFO:STPipeline:Remove reads whose AT content is 90% INFO:STPipeline:Remove reads whose GC content is 90% INFO:STPipeline:Starting the pipeline: 2020-11-10 07:47:14.874577 INFO:STPipeline:Start filtering raw reads 2020-11-10 07:47:14.881028 INFO:STPipeline:Trimming stats total reads (pair): 62992725 INFO:STPipeline:Trimming stats 1978584 reads have been dropped! INFO:STPipeline:Trimming stats you just lost about 3.14% of your data INFO:STPipeline:Trimming stats reads remaining: 61014141 INFO:STPipeline:Trimming stats dropped pairs due to incorrect UMI: 0 INFO:STPipeline:Trimming stats dropped pairs due to low quality UMI: 627057 INFO:STPipeline:Trimming stats dropped pairs due to high AT content: 573901 INFO:STPipeline:Trimming stats dropped pairs due to high GC content: 140 INFO:STPipeline:Trimming stats dropped pairs due to presence of artifacts: 758274 INFO:STPipeline:Trimming stats dropped pairs due to being too short: 19212 INFO:STPipeline:Starting genome alignment 2020-11-10 08:05:30.468736 ERROR:STPipeline:Error mapping with STAR. Output file not present /usr/bin/dbit/output/tmp/Aligned.sortedByCoord.out.bam b'\nEXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc\nPossible cause 1: not enough RAM. Check if you have enough RAM 108847334315717446 bytes\nPossible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 108847334315717446\n\nNov 10 08:05:30 ......FATAL ERROR, exiting\n'

jfnavarro commented 3 years ago

Hello,

How much RAM memory do you have available in the machine that you are using?

Best, Jose

amdixit commented 3 years ago

Turns out we had a corruption of genome files. The issue resolved after we fixed that. Thanks!