alexdobin / STAR

RNA-seq aligner
MIT License
1.83k stars 503 forks source link

STAR stucked at 'inserting junctions into the genome indices' #1419

Open dongspy opened 2 years ago

dongspy commented 2 years ago

STAR is awesome RNASeq aligner.

I'm running a pipeline that runs the STAR version 2.7.9a , but when trying to mapping with the Human genome(hg38) using the bam file as input on the HPC platform with 32 CPU and 64GB memory, it gets stuck at "inserting junctions into the genome indices" for hours, without generating any errors. And the status of subtask changed to Z(zombie). The indices are created with the same version of STAR.

When I run the same command on the large memory platform with 32 CPU and 128 GB memory, the problem is disappear. For the human genome, is the 64GB memory not enough?

When I run the command using fastq.gz instead of bam as input on the platform with 32 CPU and 64GB memory, the problem is disappear. What is the difference in the way between the two kind of file ?

Below is the detail

code

STAR --readFilesCommand samtools view --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /db_reference/zUMIs/STAR_INDEX_2.7.9a --sjdbGTFfile /db_reference/zUMIs/gencode.v38.annotation.gtf --runThreadN 1 --readFilesType SAM SE --twopassMode Basic \
        --readFilesIn /abcshare/nfshome/lipidong/mr.fool/test/data/RNA.RNAad.filtered.tagged.bam \
        --outFileNamePrefix /abcshare/nfshome/lipidong/test/STAR_2.7.9a/slurm/output/RNA.filtered.tagged.
STAR version: 2.7.9a   compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Dec 07 09:04:13 ..... started STAR run
Dec 07 09:04:13 ..... loading genome
Dec 07 09:09:01 ..... processing annotations GTF
Dec 07 09:09:20 ..... inserting junctions into the genome indices
Dec 07 09:10:56 ..... started 1st pass mapping
Dec 07 09:11:20 ..... finished 1st pass mapping
Dec 07 09:11:21 ..... inserting junctions into the genome indices

image

before running $free -g


              total        used        free      shared  buff/cache   available
Mem:            62G        1.1G         60G        2.8M        948M         61G
Swap:            0B          0B          0B

running $free -h

              total        used        free      shared  buff/cache   available
Mem:            62G         34G        4.3G        2.8M         24G         27G
Swap:            0B          0B          0B

Log.out.log

dongspy commented 2 years ago

Changing the bam file into sam file and removing the --readFilesCommand samtools view , I solved the problem. Maybe the samtools view took up too much buff/cache.

alexdobin commented 2 years ago

Hi @lipidong

I suspect that the --readFilesCommand samtools view may be problematic with the 2-pass mapping option, as the BAM file needs to be read twice. Converting it to SAM first is a good solution.

Cheers Alex