alexdobin / STAR

RNA-seq aligner
MIT License
1.86k stars 506 forks source link

STAR stoppes when "sorting Suffix Array chunks and saving them to disk..." #1320

Open NJeanray opened 3 years ago

NJeanray commented 3 years ago

Hello,

I run STAR genomeGenerate on a cluster of 40CPUs (among 80) with 500Go of RAM available.

  > STAR version=2.7.9a_2021-06-25
  > STAR compilation time,server,dir=2021-08-05T06:34:56+00:00 f90890ab3b7c:/tmp/STAR/source
  >Command Line:
  > STAR --runThreadN 40 --runMode genomeGenerate --genomeDir /home/NJEANRAY/Daijin/Daijin/2-alignments/star/index --genomeFastaFiles /home/NJEANRAY/Daijin/Daijin/0-reference/genome.fa --limitGenomeGenerateRAM=168632691637
  > Initial USER parameters from Command Line:
  > All USER parameters from Command Line:
  > runThreadN                    40     ~RE-DEFINED
  > runMode                       genomeGenerate        ~RE-DEFINED
  > genomeDir                     /home/NJEANRAY/Daijin/Daijin/2-alignments/star/index     ~RE-DEFINED
  > genomeFastaFiles              /home/NJEANRAY/Daijin/Daijin/0-reference/genome.fa        ~RE-DEFINED
  > limitGenomeGenerateRAM        168632691637     ~RE-DEFINED
  > Finished reading parameters from all sources
  > 
  >Final user re-defined parameters-----------------:
  > runMode                           genomeGenerate   
  > runThreadN                        40
  > genomeDir                         /home/NJEANRAY/Daijin/Daijin/2-alignments/star/index
  > genomeFastaFiles                  /home/NJEANRAY/Daijin/Daijin/0-reference/genome.fa   
  > limitGenomeGenerateRAM            168632691637
  > 
  > -------------------------------
  > Final effective command line:
  > STAR   --runMode genomeGenerate      --runThreadN 40   --genomeDir /home/NJEANRAY/Daijin/Daijin/2-alignments  star/index   --genomeFastaFiles /home/NJEANRAY/Daijin/Daijin/0-reference/genome.fa      --limitGenomeGenerateRAM 168632691637
  > ----------------------------------------

Everything runs fine until the step "sorting Suffix Array chunks and saving them to disk..." where the process seems to freeze :

Genome sequence total length = 63147197748
Genome size with padding = 63237259264
Estimated genome size with padding and SJs: total=genome+SJ=63438259264 = 63237259264 + 201000000
GstrandBit=36
Number of SA indices: 6206015194
Aug 06 13:51:50 ... starting to sort Suffix Array. This may take a long time...
Number of chunks: 99;   chunks size limit: 632372592 bytes
Aug 06 14:00:14 ... sorting Suffix Array chunks and saving them to disk...

I try to run it on genome GRCh38 Homo Sapiens (~31Go).

Could you please help ?

Thanks in advance

alexdobin commented 3 years ago

Hi @NJeanray

this looks like a very large genome, 63 Gigabases? GRCh38 (without patches/contigs) should be only ~3Gb? Where did you get the FASTA file from?

Cheers Alex