alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

BAM file erased as soon alignment is complete #1272

Open serineala opened 3 years ago

serineala commented 3 years ago

Hello, I'm running STAR (STAR-2.7.9a) inside of a snakemake pipeline and it uses this command:

$STAR --genomeDir outputs/star_index --runThreadN 5 --readFilesIn outputs/reads/rrna_filtered/s1.1.fq.gz outputs/reads/rrna_filtered/s1.2.fq.gz --outFileNamePrefix outputs/star_aln/s1 --outSAMtype BAM Unsorted --readFilesCommand zcat

I can see the BAM file being written to, but as soon as the alignment is done, it just disappears. My machine has 96 cores and tons of RAM and >1000G of disk space. The fastq file itself is just around 2G. The end of the s1Log.out file says this:

189 KI270752.1 27745 3136815104 190 KI270753.1 62944 3137077248 191 KI270754.1 40191 3137339392 192 KI270755.1 36723 3137601536 193 KI270756.1 79590 3137863680 194 KI270757.1 71251 3138125824 --sjdbOverhang = 149 taken from the generated genome Started loading the genome: Fri Jun 18 20:23:01 2021

Genome: size given as a parameter = 3254228641 SA: size given as a parameter = 25278668163 SAindex: size given as a parameter = 1 Read from SAindex: pGe.gSAindexNbases=14 nSAi=357913940 nGenome=3254228641; nSAbyte=25278668163 GstrandBit=32 SA number of indices=6128161978 Shared memory is not used for genomes. Allocated a private copy of the genome. Genome file size: 3254228641 bytes; state: good=1 eof=0 fail=0 bad=0 Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 3254228641 bytes SA file size: 25278668163 bytes; state: good=1 eof=0 fail=0 bad=0 Loading SA ... done! state: good=1 eof=0 fail=0 bad=0; loaded 25278668163 bytes Loading SAindex ... done: 1565873619 bytes Finished loading the genome: Fri Jun 18 20:24:22 2021

Processing splice junctions database sjdbN=387427, pGe.sjdbOverhang=149 alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824 Created thread # 1 Created thread # 2 Created thread # 3 Starting to map file # 0 mate 1: outputs/reads/rrna_filtered/36682284.1.fq.gz mate 2: outputs/reads/rrna_filtered/36682284.2.fq.gz Created thread # 4 Thread #2 end of input stream, nextChar=-1 Completed: thread #3 Completed: thread #2 Completed: thread #0 Completed: thread #1 Joined thread # 1 Joined thread # 2 Joined thread # 3 Completed: thread #4 Joined thread # 4 ALL DONE!

I see no error message anywhere. If I call the above command by itself, it runs just fine. Would you please help me troubleshoot? I've tried with several values of "--runThreadN" and "--genomeLoad". Thank you!

alexdobin commented 3 years ago

Hi @serineala

if you see that the BAM file is created, and then it disappears at the end of the run, that's definitely weird. Does this happen when you run STAR command "manually" from command prompt (without snakemake)? The output you posted suggests that the run completed successfully. What the output in Log.final.out?

Cheers Alex

serineala commented 3 years ago

Hi Alex,

Yeah, I just cannot figure it out. When I run it "manually" from the cmd prompt, everything is fine. No issues. This is the end of the Log.final.out:

                    Uniquely mapped reads % |   77.83%
                      Average mapped length |   295.97
                   Number of splices: Total |   12092831
        Number of splices: Annotated (sjdb) |   11954139
                   Number of splices: GT/AG |   11972689
                   Number of splices: GC/AG |   88578
                   Number of splices: AT/AC |   12098
           Number of splices: Non-canonical |   19466
                  Mismatch rate per base, % |   0.45%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.85
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.98
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   377311
         % of reads mapped to multiple loci |   2.28%
    Number of reads mapped to too many loci |   5922
         % of reads mapped to too many loci |   0.04%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 3274014 % of reads unmapped: too short | 19.80% Number of reads unmapped: other | 8782 % of reads unmapped: other | 0.05% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%

Any insight/suggestion would be great. All other rules in the Snakemake file work well. Thank you!

alexdobin commented 3 years ago

Hi @serineala

I am afraid this is the issue with the pipeline, for some reason it deletes the file after completion. I do not have experience with Snakemake, so cannot help you with that, unfortunately. Might be a good idea to ask for help on Snakemake forums.

Cheers Alex