YeoLab / gscripts

General Use Scripts and Helper functions
MIT License
18 stars 18 forks source link

analyze_rna_seq.scala paired end segfault at STAR and sailfish walltime ends #59

Closed olgabot closed 9 years ago

olgabot commented 9 years ago

Here's a gist with all the documents: https://gist.github.com/olgabot/160c649786d45920ed09

In the file singlecell_pnms_pe_v4_error_star_sailfish_counts.txt I've counted the number of samples that failed at either the sailfish quant or STAR stages, and all 262 samples failed at STAR, but only 65 failed at sailfish.

In [11]: len(star_errors )
Out[11]: 262

In [12]: len(sailfish_errors )
Out[12]: 65

I'm investigating further, playing @gpratt's favorite game of "one of these is not like the other"

olgabot commented 9 years ago

For the sailfish errors, the error log shows that those 65 samples ran out of walltime. Here's the tail of one of those files:

=>> PBS: job killed: walltime 7224 exceeded limit 7200
Nodes:        tscc-2-10
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH

Here's proof that it happens in all 65 of those files:

$ tail /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/*_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sailfish.out | grep walltime | wc -l
65
olgabot commented 9 years ago

Looking at the STAR logs, it's always a segmentation fault:

head /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/*_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out | less -S

Example output:

==> /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/M1_01_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out <==
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/.queue/tmp/.exec8316565503078234537: line 2: 10692 Segmentation fault      STAR '--runMode' 'alignReads
Nodes:        tscc-2-52
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH

==> /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/M1_02_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out <==
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/.queue/tmp/.exec5690637824564258731: line 2:  9985 Segmentation fault      STAR '--runMode' 'alignReads
Nodes:        tscc-2-52
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH

==> /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/M1_03_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out <==
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/.queue/tmp/.exec4681241327761770130: line 2:  7400 Segmentation fault      STAR '--runMode' 'alignReads
Nodes:        tscc-2-12
discarding /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/bin from PATH
prepending /projects/ps-yeolab/software/anaconda-2.1.0_2015-01-20/envs/olga/bin to PATH
olgabot commented 9 years ago

Hmm, but a segfault doesn't explain all of the errors:

$ grep Segmentation /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/*_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out | wc -l
246

262-246 = 16 samples unexplained

olgabot commented 9 years ago

By searching for everything that's NOT a segmentation fault, via:

$ grep -v Segmentation /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/*_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out | grep -v anaconda | grep -v Nodes > singlecell_pnms_pe_v4_error_star_not_segfault.txt

it looks like there's a few queue errors, and the rest are R1 and R2 not matching up properly. Here's some of the non-R1/R2 errors:

/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/M1_06_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out:/usr/bin/ipcrm: invalid id (196608)
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/MSA_16_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out:Can't remove /etc/security/access.conf: No such file or directory, skipping file.
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/MSA_18_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out:Can't remove /etc/security/access.conf: No such file or directory, skipping file.
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/N4_11_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out:/usr/bin/ipcrm: invalid id (0)
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/P2_03_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out:/usr/bin/ipcrm: invalid id (163840)
/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/P3_02_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam.out:/usr/bin/ipcrm: invalid id (262144)
olgabot commented 9 years ago

Here's the full STAR command for one of the files:

STAR  '--runMode' 'alignReads'  '--runThreadN' '16'  '--genomeDir' '/projects/ps-yeolab/genomes/hg19/star_sjdb'  '--genomeLoad' 'LoadAndRemove'  '--readFilesIn' '/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/P9_04_R1.fastq.gz.polyATrim.adapterTrim.rmRep.fastq'  '/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/P9_04_R2.fastq.gz.polyATrim.adapterTrim.rmRep.fastq'  '--outSAMunmapped' 'Within'  '--outFilterMultimapNmax' '10'  '--outFilterMultimapScoreRange' '1'  '--outFileNamePrefix' '/home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/P9_04_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam'  '--outSAMattributes' 'All'  '--outSAMstrandField intronMotif'  '--outStd' 'BAM_SortedByCoordinate'  '--outSAMtype' 'BAM' 'SortedByCoordinate'  '--outFilterType' 'BySJout'  '--outReadsUnmapped' 'Fastx'  '--outFilterScoreMin' '10' > /home/obotvinnik/projects/singlecell_pnms/analysis/singlecell_pnms_pe_v4/P9_04_R1.fastq.gz.polyATrim.adapterTrim.rmRep.sam 
olgabot commented 9 years ago

aha! turns out STAR.scalarequests 8 cores from TSCC, but the STAR command says to use 16 cores. I've changed this now:

https://github.com/gpratt/gatk/pull/9/files#diff-89a3c229db8cb54aefacdeffca10598cL45