CCRGeneticsBranch / ngs_pipeline.hg38_v1

https://CCRGeneticsBranch.github.io/ngs_pipeline.hg38_v1/
0 stars 0 forks source link

STAR Memory error (limitBAMsortRAM) #12

Closed slsevilla closed 1 year ago

slsevilla commented 1 year ago

Summary

Example error sample log file:

/data/khanlab/projects/processed_DATA/log/STARens.52013050.e

Error:

EXITING because of fatal ERROR: not enough memory for BAM sorting: 
SOLUTION: re-run STAR with at least --limitBAMsortRAM 127531917023
Nov 02 05:46:05 ...... FATAL ERROR, exiting
[Wed Nov  2 05:46:06 2022]

Example code:

cd ${LOCAL}/
    ulimit -u 10240 -n 16384
    STAR    --genomeDir /data/khanlab/projects/ngs_pipeline_testing/References_4.0/New_GRCh37/Index/STAR_2.7.8a --readFilesIn /data/khanlab/projects/DATA/Sample_SJRHB031519_D1_T_HCKNCDRXX/Sample_SJRHB031519_D1_T_HCKNCDRXX_R1.fastq.gz /data/khanlab/projects/DATA/Sample_SJRHB031519_D1_T_HCKNCDRXX/Sample_SJRHB031519_D1_T_HCKNCDRXX_R2.fastq.gz --readFilesCommand zcat --outFileNamePrefix SJRHB031519_D1_T_HCKNCDRXX_ENS --runThreadN ${THREADS} --twopassMode Basic --outSAMunmapped Within --chimSegmentMin 12  --chimJunctionOverhangMin 12 --alignSJDBoverhangMin 10  --alignMatesGapMax 100000  --alignIntronMax 100000  --chimSegmentReadGapMax 3 --outFilterMismatchNmax 2 --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM --outBAMsortingThreadN 6  --limitBAMsortRAM 122471659382
    echo "STAR ENS mapping completed"
    mv -f SJRHB031519_D1_T_HCKNCDRXX_ENSChimeric.out.junction /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXXChimeric.out.junction
    mv -f SJRHB031519_D1_T_HCKNCDRXX_ENSAligned.toTranscriptome.out.bam /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX.ENS_transcriptome.bam
    mv -f SJRHB031519_D1_T_HCKNCDRXX_ENSSJ.out.tab /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX_ENS_SJ.out.tab
    java -Xmx${MEM}g -Djava.io.tmpdir=${LOCAL} -jar $PICARD_JAR AddOrReplaceReadGroups VALIDATION_STRINGENCY=SILENT INPUT=SJRHB031519_D1_T_HCKNCDRXX_ENSAligned.sortedByCoord.out.bam OUTPUT=/data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX.star.bam SORT_ORDER=coordinate RGLB=SJRHB031519_D1_T_HCKNCDRXX RGPU=SJRHB031519_D1_T_HCKNCDRXX RGPL=ILLUMINA RGSM=SJRHB031519_D1_T_HCKNCDRXX RGCN=khanlab
    samtools index /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX.star.bam
    awk -F"\t" 'BEGIN{OFS="\t"}{strand=($4==1)?"+":"-";annotated=($6==1)?"true":"false";if($5==0) motif="non-canonical"; if($5==1)motif="GT/AG";if($5==2)motif="CT/AC";if($5==3)motif="GC/AC";if($5==4)motif="CT/GC";if($5==5)motif="AT/AC";if($5==6)motif="GT/AT";print $1,$2,$3,"motif="motif";uniquely_mapped="$7";multi_mapped="$8";maximum_spliced_alignment_overhang="$9";annotated_junction="annotated,$7,strand}' /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX_ENS_SJ.out.tab |bedtools sort |bgzip > /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX_ENS_SJ.out.tab.bed.gz
    tabix -0 -p bed /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX_ENS_SJ.out.tab.bed.gz

Suggested updates: 1) Change resource from 'local' using 'ulimit' to lscatch. Set thresholds for this rule within config instead of within rule 2) Remove set limit for 'limitBAMsortRAM'

Example updates:

# add TMPDIR to params; set as (WORKDIR/tmp/star)

# add to rule config
## gres: lscratch:800
## threads: 32
## mem: 75g

if [[ -d "/lscratch/$SLURM_JOB_ID" ]]; then 
    TMPDIR="/lscratch/$SLURM_JOB_ID"
else
    TMPDIR={params.tmp}
    if [[ ! -d $TMPDIR ]]; then mkdir -p $TMPDIR; fi
fi

# Run STAR
STAR    --genomeDir /data/khanlab/projects/ngs_pipeline_testing/References_4.0/New_GRCh37/Index/STAR_2.7.8a --readFilesIn /data/khanlab/projects/DATA/Sample_SJRHB031519_D1_T_HCKNCDRXX/Sample_SJRHB031519_D1_T_HCKNCDRXX_R1.fastq.gz /data/khanlab/projects/DATA/Sample_SJRHB031519_D1_T_HCKNCDRXX/Sample_SJRHB031519_D1_T_HCKNCDRXX_R2.fastq.gz --readFilesCommand zcat --outFileNamePrefix $TMPDIR/SJRHB031519_D1_T_HCKNCDRXX_ENS --runThreadN ${THREADS} --twopassMode Basic --outSAMunmapped Within --chimSegmentMin 12  --chimJunctionOverhangMin 12 --alignSJDBoverhangMin 10  --alignMatesGapMax 100000  --alignIntronMax 100000  --chimSegmentReadGapMax 3 --outFilterMismatchNmax 2 --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM 

# MOVE JUNCTIONS FILE
mv -f $TMPDIR/SJRHB031519_D1_T_HCKNCDRXX_ENSChimeric.out.junction /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXXChimeric.out.junction

# MOVE TRANSCRIPTOME BAM
mv -f $TMPDIR/SJRHB031519_D1_T_HCKNCDRXX_ENSAligned.toTranscriptome.out.bam /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX.ENS_transcriptome.bam

#MOVE TAB
mv -f $TMPDIR/SJRHB031519_D1_T_HCKNCDRXX_ENSSJ.out.tab /data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX_ENS_SJ.out.tab

# RUN PICARD
java -Xmx${MEM}g -Djava.io.tmpdir=${LOCAL} -jar $PICARD_JAR AddOrReplaceReadGroups VALIDATION_STRINGENCY=SILENT INPUT=$TMPDIR/SJRHB031519_D1_T_HCKNCDRXX_ENSAligned.sortedByCoord.out.bam OUTPUT=/data/khanlab/projects/processed_DATA//SJ031519/SJRHB031519_D1/SJRHB031519_D1_T_HCKNCDRXX/SJRHB031519_D1_T_HCKNCDRXX.star.bam SORT_ORDER=coordinate RGLB=SJRHB031519_D1_T_HCKNCDRXX RGPU=SJRHB031519_D1_T_HCKNCDRXX RGPL=ILLUMINA RGSM=SJRHB031519_D1_T_HCKNCDRXX RGCN=khanlab
vinegang commented 1 year ago

@slsevilla Please go ahead and make the changes in your dev pipeline and push them as needed.

slsevilla commented 1 year ago

Added to the wrong pipeline. Closing issue