BenLangmead / qtip

Qtip: a tandem simulation approach for accurately predicting read alignment mapping qualities
MIT License
25 stars 7 forks source link

Error: missing ZT tag when running with bowtie2 #2

Closed johanneskoester closed 6 years ago

johanneskoester commented 6 years ago

When running qtip 1.6.2 with bowtie2 2.3.3.1 (both from bioconda), I get the following error:

Qtip 1.6.2
10/26/17-11:26:23:INFO:Command for aligning input data: "bowtie2 --local -p 8 "
10/26/17-11:26:23:INFO:Bowtie 2 command: bowtie2 --local -p 8  -1 reads/simulated.normal.1.fastq -2 reads/simulated.normal.2.fastq --mapq-extra -S mapped-qtip/tmpityhvp3_/input_alignments/tmp -x index/hg18/genome
424270498 reads; of these:
  424270498 (100.00%) were paired; of these:
    1533396 (0.36%) aligned concordantly 0 times
    331853690 (78.22%) aligned concordantly exactly 1 time
    90883412 (21.42%) aligned concordantly >1 times
    ----
    1533396 pairs aligned concordantly 0 times; of these:
      7751 (0.51%) aligned discordantly 1 time
    ----
    1525645 pairs aligned 0 times concordantly or discordantly; of these:
      3051290 mates make up the pairs; of these:
        2872276 (94.13%) aligned 0 times
        32922 (1.08%) aligned exactly 1 time
        146092 (4.79%) aligned >1 times
99.66% overall alignment rate
10/28/17-19:22:19:INFO:  running "/export/scratch2/koster/prosic-evaluation/.snakemake/conda/1f16ee0e/opt/qtip-1.6.2/qtip-parse ifs --  -- mapped-qtip/tmpityhvp3_/input_alignments/tmp -- /export/scratch2/koster/data/ref/hg18.fasta -- mapped-qtip/tmpityhvp3_/input_intermediates/tmp -- mapped-qtip/tmpityhvp3_/tandem_intermediates/tmpinp"
Parsing SAM file "mapped-qtip/tmpityhvp3_/input_alignments/tmp" (seed=0)
Input SAM file did not have ZT:Z field.  Be sure to run a version of the aligner that produces the output needed for qtip.
terminate called after throwing an instance of 'int'
Traceback (most recent call last):
  File "/export/scratch2/koster/prosic-evaluation/.snakemake/conda/1f16ee0e/bin/qtip", line 1075, in <module>
    go_profile(vars(_args), _aligner_args, _aligner_unpaired_args, _aligner_paired_args)
  File "/export/scratch2/koster/prosic-evaluation/.snakemake/conda/1f16ee0e/bin/qtip", line 1026, in go_profile
    go(args, aligner_args, aligner_unpaired_args, aligner_paired_args)
  File "/export/scratch2/koster/prosic-evaluation/.snakemake/conda/1f16ee0e/bin/qtip", line 505, in go
    _do_parse_input_sam()
  File "/export/scratch2/koster/prosic-evaluation/.snakemake/conda/1f16ee0e/bin/qtip", line 471, in _do_parse_input_sam
    raise RuntimeError("qtip-parse returned %d" % ret)
RuntimeError: qtip-parse returned 6
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!

In other words, qtip-parse fails to find the ZT tag for at least one alignment. Further investigation reveals that, although the tag is present in the vast majority of the alignments, it is indeed missing for some. An random sample comes here:

@HD     VN:1.0  SO:unsorted
@SQ     SN:chr10        LN:135374737
@SQ     SN:chr11        LN:134452384
@SQ     SN:chr12        LN:132349534
@SQ     SN:chr13        LN:114142980
@SQ     SN:chr14        LN:106368585
@SQ     SN:chr15        LN:100338915
@SQ     SN:chr16        LN:88827254
@SQ     SN:chr17        LN:78774742
@SQ     SN:chr18        LN:76117153
@SQ     SN:chr19        LN:63811651
@SQ     SN:chr1 LN:247249719
@SQ     SN:chr20        LN:62435964
@SQ     SN:chr21        LN:46944323
@SQ     SN:chr22        LN:49691432
@SQ     SN:chr2 LN:242951149
@SQ     SN:chr3 LN:199501827
@SQ     SN:chr4 LN:191273063
@SQ     SN:chr5 LN:180857866
@SQ     SN:chr6 LN:170899992
@SQ     SN:chr7 LN:158821424
@SQ     SN:chr8 LN:146274826
@SQ     SN:chr9 LN:140273252
@SQ     SN:chrM LN:16571
@SQ     SN:chrX LN:154913754
@SQ     SN:chrY LN:57772954
@PG     ID:bowtie2      PN:bowtie2      VN:2.3.3.1      CL:"/export/scratch2/koster/prosic-evaluation/.snakemake/conda/1f16ee0e/bin/bowtie2-align-s --wrapper basic-0 --local -p 8 --mapq-extra -S mapped-qtip/tmpityhvp3_/input_alignments/tmp -x index/hg18/genome -1 reads/simulated.normal.1.fastq -2 reads/simulated.normal.2.fastq"
sim_control30x_Control_chr1_1_1a01ab    77      *       0       0       *       *       0       0       TACGACACAACGAGCCCAATAGTCCTTATTACGTAAACATGTACACTGCGAAGGACATCTTGATGAAGAGATGAACGTATAACATTTATTTTTGTACACG    FGD7FFGDDEF?=D?GEGGFD?@GF-EG:GAGGDADF-FFF?FEFB=AG:GF2DFG=FE?@GAAD=B==DF,AEFDB?>5G(D@##EFD#==F>C=EDA?    YT:Z:UP
sim_control30x_Control_chr1_1_1a01ab    141     *       0       0       *       *       0       0       GACGGAGAAGTGTTTATGGGACAGGTCGACCGTGGAGAAGTTATATGAAAATCGTTTGTTTTACTACTTATAGGTTAAATTGTTTTTGCTCCCTTTTCTT    GED9DGDGFGDGG?DEG:F:GAEG=D#?GC=DEG:1F0DF;F-=:=AG=EDGD?F=ECB@AG:F#D>@=F#5;?EG:6/3#BC##=##.6?4=:+#5@?#    YT:Z:UP
sim_control30x_Control_chr1_1_1a5d92    147     chr1    142192315       11      100M    =       142192098       -319    TTATATTCTACCCGCAGCTAGTTGAGGTAAAGAGAGGGCATTTAACTGTCAAGGGGACCTGTTAGGAAACTGTAGAAATCTGGAAAGTCTCTTGGGTCTT    ?6)<#=C#GB###B@#D53#C/B=EC:AEEBBF#:AEEGEDAGGC6BDG::E/FDBD??DF=F8GC.)AFBFGAG;?GGFEG=GFECE5EBGEAEGA-DG    AS:i:192        XS:i:178        XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:12T54G32   YS:i:196        YT:Z:CP Z:194,120,60,200,93,394,267,85,45,555,444,6,22,444,222,3
sim_control30x_Control_chr1_1_1a37b     101     chr1    246758348       0       *       =       246758348       0       ATTTTGTGGTTTTCGTTACCGTTGTTTTCGGTTTCAACTGTTTACCCTAGATTAATTTCTCGAAGACACGTCGTTTCCTTTGATAGTAGTATCACTTGTC    AEF3FC6GBF?FEBEF@G@FGEFF@@0@G@;:ACGGFAB:FGG?>FCGEDG#G0F@AAEC?>A5AEG=FBFC>A@C=BEGC>.8GED##>DE#5?#)???    YT:Z:UP
sim_control30x_Control_chr1_1_1a130a    69      chr1    111944164       0       *       =       111944164       0       TATGGCAAAATCCCATCCTACCAAGCCTTCTTGCTATATACATTTGAGTAGAAAATGACATGAGCCAGAAGCTCCAAAGCCTGAGATCCAGAGCATAAAC    DE?DEG=F<GFF=EGD>E5FEDGDFFDDDDDFFEDGGGCD9FE>AE@EEDA#DAGDCGEDA=A=?B?CEG+5E;DDEDAF#A?GDD#2D#C1C###=###    YT:Z:UP
sim_control30x_Control_chr1_1_1a542d    77      *       0       0       *       *       0       0       GCGGTGTAAAAGAATTAGGTGAAATAGTAACAACCTGTAAACCCAACCAAGGTCCAGAAACGATAACACTTATGACGACGTCATTTGTATGTACACGTAC    EGEG>GEEEG=DGEFB?GGGFEAGFFFBFAEFDGGDE?GDAEFFACCEFFDEECGCGD?=F0?;GCD>AD@=5#:AG3C>GGD#G#D>C=CC>=:#:A#A    YT:Z:UP

Any chance for a quick fix or workaround? Is bowtie2 or qtip-parse to blame?

BenLangmead commented 6 years ago

In 5 out of the 6 cases you paste above, the read failed to align. ZT:Z output is only printed for reads that align. But in one of the cases, the one with 147 in the FLAG field, the read did align. In that case the ZT:Z string is malformed. It says Z:194,120,... rather than ZT:Z:194,120,.... I'll ponder a bit, but can you confirm that the SAM lines you posted were unmodified? I.e. those are directly from bowtie2?

johanneskoester commented 6 years ago

Yes, exactly. I took them from the path that is printed in the qtip log. Note that I ran bowtie with 8 threads. This is my qtip invocation (the braces are snakemake wildcards that get replaced by the concrete values of the job): qtip --bt2-exe 'bowtie2 --local -p {threads}' --temp-directory {params.tmp} --m1 {input.m1} --m2 {input.m2} --index {params.index} --ref {input.ref}

BenLangmead commented 6 years ago

Any chance I could get your reads/simulated.normal.1.fastq and reads/simulated.normal.2.fastq? I can probably take it from there, but another valuable piece of info would be: does it also happen with -p 1?

johanneskoester commented 6 years ago

Currently, I don't have the resources to run it again with -p1 (I have moved to bwa, where this problem does not occur). I don't think it could be related to the real data, but it would in principle be possible to send them to you. However, they are quite big. If you really want them, do you have facility (FTP server?), where I could upload them?

dnbaker commented 6 years ago

I wonder if you could reproduce the problem with a small subset of the reads you were running it on previously, @johanneskoester.

johanneskoester commented 6 years ago

Sorry, I don't have the time and computational resources to do that right now.

BenLangmead commented 6 years ago

Update: this might be due to https://github.com/BenLangmead/bowtie2/issues/158, which we have a very preliminary fix for:

https://github.com/BenLangmead/bowtie2/releases/tag/v2.3.4-alpha

I'd be interested to know if you're able to reproduce with that Bowtie2 version.

johanneskoester commented 6 years ago

Sorry, I don't have enough time at the moment. But it sounds indeed like the same problem.

BenLangmead commented 6 years ago

If so, then Bowtie 2 v2.3.4 (now released) should fix it

johanneskoester commented 6 years ago

Great! Feel free to close the issue if you think it is fixed. I won't test this combination any time soon.