Closed twall2014 closed 9 years ago
Hi Tim,
Thanks for reporting this. It seems like the problem is the result of orphaned mappings (i.e. the un-mapped read from a paired-end read). Specifically, adjustOverhang should be called for both mates of a pair, even if one of them is orphaned. I'll ping back here once I've pushed the fix and you can tell me if it resolves the issue for you.
Hi Tim,
I just pushed an update to the code. Could you build the latest version and let me know if you still encounter this same issue?
Thanks, Rob
Still getting truncated files.
On Thu, Sep 17, 2015 at 11:21 AM, Rob Patro notifications@github.com wrote:
Hi Tim,
I just pushed an update to the code. Could you build the latest version and let me know if you still encounter this same issue?
Thanks, Rob
— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141120252.
Is this for the same reason, or is something different occurring? Do you have a small example dataset that reproduces this error?
Yes, still returns the same error about CIGAR length mismatch. I am currently trying on lower values of k to see what happens and I'll try to generate/send you a file where the error occurs early on.
On Thu, Sep 17, 2015 at 12:20 PM, Rob Patro notifications@github.com wrote:
Is this for the same reason, or is something different occurring? Do you have a small example dataset that reproduces this error?
— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141137999.
Here's a sliver of a file that failed two alignments in (k=15) due to the same problem. I also included the next two lines, because it seemed like the program would fail there too.
48506_49009_0:0:0_1:2:0_0 121 3 17641571 1 76M = 17641571 0 TGTCGTAAGATGATACTAGCTGGCTAAGAGGCCACCTGGTGCCACATCATAGGCACATTGCGAACGTAAATAATTA 2222222222222222222222222222222222222~2222222222222222222222222222222222222~ NH:i:2 Pt_48506_49009_0:0:0_1:2:0_0 2485 3 17641571 0 76M = 17641571 0 TTAGATACAATAGAAATAGAATTGTATCCCCCCCTTCATTTATTGCTTTCCGATCTTATTTATTCATTCC 2222222222222222222222222222222222222222222222222222222222222222222222 NH:i:2 Pt_48506_49009_0:0:0_1:2:0_0 2489 Pt 48505 1 70M = 48505 0 GAATGAATAAATAAGATCGGAAAGCAATAAATGAAGGGGGGGATACAATTCTATTTCTATTGTATCTAAA 2222222222222222222222222222222222~2222222222222222222222222222222222~ NH:i:2 Pt_48506_49009_0:0:0_1:2:0_0 2421 Pt 48505 0 70M = 48505 0 AATTATTTACGTTCGCAATGTGCCTATGATGTGGCACCAGGTGGCCTCTTAGCCAGCTAGTATCATCTTACGACAA 2222222222222222222222222222222222222222222222222222222222222222222222222222 NH:i:2
Should I run on quasiindex/quasimap to see if it has a similar issue? That might help if those functions aren't having problems.
On Thu, Sep 17, 2015 at 12:33 PM, Tim Wall twall2010@gmail.com wrote:
Yes, still returns the same error about CIGAR length mismatch. I am currently trying on lower values of k to see what happens and I'll try to generate/send you a file where the error occurs early on.
On Thu, Sep 17, 2015 at 12:20 PM, Rob Patro notifications@github.com wrote:
Is this for the same reason, or is something different occurring? Do you have a small example dataset that reproduces this error?
— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141137999.
Ok; strange --- that's the same bug I thought the previous commit fixed. Can you share the files (reference and reads) that produce this problem (via e-mail, dropbox, etc) so that I can test out the fix before I push it?
I used the files from here ( https://drive.google.com/file/d/0B-uC03eEUWJQMTROSXYxUkNmcFU/view?usp=sharing) but I imagine you could also reproduce the error with the much smaller files I sent you before.
On Thu, Sep 17, 2015 at 1:06 PM, Rob Patro notifications@github.com wrote:
Ok; strange --- that's the same bug I thought the previous commit fixed. Can you share the files (reference and reads) that produce this problem (via e-mail, dropbox, etc) so that I can test out the fix before I push it?
— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141152340.
There was another path through the code that I didn't account for in the previous commit. This bug should be fixed now. I was able to successfully map data that previously exhibited this issue. Let me know if this fixes it for you and I will close this issue.
Seems to work now, thanks!
On Thu, Sep 17, 2015 at 4:18 PM, Rob Patro notifications@github.com wrote:
There was another path through the code that I didn't account for in the previous commit. This bug should be fixed now. I was able to successfully map data that previously exhibited this issue. Let me know if this fixes it for you and I will close this issue.
— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141215275.
Great.it was good to squash this bug.
I am trying to test RapMap on higher values of k (>= 13) on wgsim-simulated Illumina paired reads, but when I try to convert the resulting SAM file to a BAM file with SAMtools I get a truncated file with this error:
[E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] parse error at line 16 [main_samview] truncated file.
The line number varies. However, I looked at the file and it appears the file seems to truncate at lines like this: (2nd line in particular)
Pt_90296_90860_0:0:0_1:2:0_1 121 4 1159670 1 76M = 1159670 0 ACAACAGGCTATGGGGTGGTGATCCCGCTTATGGGGTCAAATCAATACGTTCTAAGAAGAAAGATTTGACAATAAA 2222222222222222222222222222222222222~2222222222222222222222222222222222222~ NH:i:6
Pt_90296_90860_0:0:0_1:2:0_1 2485 4 1159670 0 76M = 1159670 0 CTAAGTAGCAGTCAAAAATTTGTATCCATTTTTCATGATATTATGCATGGATTAGATATATCATGGCGAA 2222222222222222222222222222222222222222222222222222222222222222222222 NH:i:6
Pt_90296_90860_0:0:0_1:2:0_1 2409 2 1026497 1 76M = 1026497 0 TTATTGTCAAATCTTTCTTCTTAGAACGTATTGATTTGACCCCATAAGCGGGATCACCACCCCATAGCCTGTTGTC 2222222222222222222222222222222222222222222222222222222222222222222222222222 NH:i:6
For whatever reason the CIGAR string length isn't matched to a read of the same length. This is not an issue on smaller k values, but there are so many primary + secondary reads in those files that it's hard to parse through them (e.g. 1M reads could have as many as 13M hits).