Closed donkirkby closed 10 years ago
No repro on my workstation.
The problem seems to stem from using different versions of bowtie2. Here are excerpts from *.prelim.csv, with seq and qual fields omitted:
Art (bowtie 2.0.6)
M01841:22:000000000-A3RCA:1:1110:16290:10899,81,INT,743,0,151S100M,=,1625,0
M01841:22:000000000-A3RCA:1:1110:16290:10899,161,INT,1625,44,97M154S,=,743,0
M01841:22:000000000-A3RCA:1:2103:12283:26425,97,INT,523,0,62M189S,=,1397,0
M01841:22:000000000-A3RCA:1:2103:12283:26425,145,INT,1397,44,183S5M1I62M,=,523,0
M01841:22:000000000-A3RCA:1:2110:27229:19935,99,INT,865,11,110S141M,=,865,361
M01841:22:000000000-A3RCA:1:2110:27229:19935,147,INT,865,11,33S218M,=,865,-361
M01841:22:000000000-A3RCA:1:2111:12490:3895,83,INT,1498,0,115S136M,=,1498,366
M01841:22:000000000-A3RCA:1:2111:12490:3895,163,INT,1498,0,136M115S,=,1498,-366
Don (bowtie 2.2.3)
M01841:22:000000000-A3RCA:1:1110:16290:10899,81,INT,1622,0,151S100M,=,746,0
M01841:22:000000000-A3RCA:1:1110:16290:10899,161,INT,746,0,97M154S,=,1622,0
M01841:22:000000000-A3RCA:1:2103:12283:26425,97,INT,1402,0,62M189S,=,1397,0
M01841:22:000000000-A3RCA:1:2103:12283:26425,145,INT,1397,0,183S5M1I62M,=,1402,0
M01841:22:000000000-A3RCA:1:2110:27229:19935,99,INT,865,11,110S141M,=,865,361
M01841:22:000000000-A3RCA:1:2110:27229:19935,147,INT,865,11,33S218M,=,865,-361
M01841:22:000000000-A3RCA:1:2111:12490:3895,83,INT,1498,0,115S136M,=,1498,366
M01841:22:000000000-A3RCA:1:2111:12490:3895,163,INT,1498,0,136M115S,=,1498,-366
Note that the first three reads are being mapped to different positions in the INT ref. What is also strange is that several positions exceed the length of INT - and the answer is that the reference sequence from the lab has dollar signs in the sequence, and in fact is a concatenation of two INT sequences..
We seem to have the same stack trace when processing sample 46892B-3515-HLA-B-E99605CLIMX-PR-RT_S71 on 22 May 2014, so I'm reopening the issue. When I look at the dumped query sequence and the reference sequence for HIV1B-gag, the query sequence has a lot of question marks in it, and is shifted all the way past the end of the reference sequence. If I try to manually align it with the reference sequence, it looks like it should start after the seventh amino acid in the reference sequence.
Fixed with commit 6849d56f09346fbc64e5f88a14b8b7324d8f292f
50791A-RT3-only-pos084-V3LOOP-2_S86 in run 130916. Here's the stack trace:
The error occurs when trying to process the INT region.
Update
Reopened because error still occurs when processing region HIV1B-gag in sample 46892B-3515-HLA-B-E99605CLIMX-PR-RT_S71 on 22 May 2014.