COMBINE-lab / RapMap

Rapid sensitive and accurate read mapping via quasi-mapping
GNU General Public License v3.0
89 stars 23 forks source link

files generated by RapMap produce inaccurate CIGAR strings #13

Closed twall2014 closed 9 years ago

twall2014 commented 9 years ago

I am trying to test RapMap on higher values of k (>= 13) on wgsim-simulated Illumina paired reads, but when I try to convert the resulting SAM file to a BAM file with SAMtools I get a truncated file with this error:

[E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] parse error at line 16 [main_samview] truncated file.

The line number varies. However, I looked at the file and it appears the file seems to truncate at lines like this: (2nd line in particular)

Pt_90296_90860_0:0:0_1:2:0_1 121 4 1159670 1 76M = 1159670 0 ACAACAGGCTATGGGGTGGTGATCCCGCTTATGGGGTCAAATCAATACGTTCTAAGAAGAAAGATTTGACAATAAA 2222222222222222222222222222222222222~2222222222222222222222222222222222222~ NH:i:6

Pt_90296_90860_0:0:0_1:2:0_1 2485 4 1159670 0 76M = 1159670 0 CTAAGTAGCAGTCAAAAATTTGTATCCATTTTTCATGATATTATGCATGGATTAGATATATCATGGCGAA 2222222222222222222222222222222222222222222222222222222222222222222222 NH:i:6

Pt_90296_90860_0:0:0_1:2:0_1 2409 2 1026497 1 76M = 1026497 0 TTATTGTCAAATCTTTCTTCTTAGAACGTATTGATTTGACCCCATAAGCGGGATCACCACCCCATAGCCTGTTGTC 2222222222222222222222222222222222222222222222222222222222222222222222222222 NH:i:6

For whatever reason the CIGAR string length isn't matched to a read of the same length. This is not an issue on smaller k values, but there are so many primary + secondary reads in those files that it's hard to parse through them (e.g. 1M reads could have as many as 13M hits).

rob-p commented 9 years ago

Hi Tim,

Thanks for reporting this. It seems like the problem is the result of orphaned mappings (i.e. the un-mapped read from a paired-end read). Specifically, adjustOverhang should be called for both mates of a pair, even if one of them is orphaned. I'll ping back here once I've pushed the fix and you can tell me if it resolves the issue for you.

rob-p commented 9 years ago

Hi Tim,

I just pushed an update to the code. Could you build the latest version and let me know if you still encounter this same issue?

Thanks, Rob

twall2014 commented 9 years ago

Still getting truncated files.

On Thu, Sep 17, 2015 at 11:21 AM, Rob Patro notifications@github.com wrote:

Hi Tim,

I just pushed an update to the code. Could you build the latest version and let me know if you still encounter this same issue?

Thanks, Rob

— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141120252.

rob-p commented 9 years ago

Is this for the same reason, or is something different occurring? Do you have a small example dataset that reproduces this error?

twall2014 commented 9 years ago

Yes, still returns the same error about CIGAR length mismatch. I am currently trying on lower values of k to see what happens and I'll try to generate/send you a file where the error occurs early on.

On Thu, Sep 17, 2015 at 12:20 PM, Rob Patro notifications@github.com wrote:

Is this for the same reason, or is something different occurring? Do you have a small example dataset that reproduces this error?

— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141137999.

twall2014 commented 9 years ago

Here's a sliver of a file that failed two alignments in (k=15) due to the same problem. I also included the next two lines, because it seemed like the program would fail there too.

48506_49009_0:0:0_1:2:0_0 121 3 17641571 1 76M = 17641571 0 TGTCGTAAGATGATACTAGCTGGCTAAGAGGCCACCTGGTGCCACATCATAGGCACATTGCGAACGTAAATAATTA 2222222222222222222222222222222222222~2222222222222222222222222222222222222~ NH:i:2 Pt_48506_49009_0:0:0_1:2:0_0 2485 3 17641571 0 76M = 17641571 0 TTAGATACAATAGAAATAGAATTGTATCCCCCCCTTCATTTATTGCTTTCCGATCTTATTTATTCATTCC 2222222222222222222222222222222222222222222222222222222222222222222222 NH:i:2 Pt_48506_49009_0:0:0_1:2:0_0 2489 Pt 48505 1 70M = 48505 0 GAATGAATAAATAAGATCGGAAAGCAATAAATGAAGGGGGGGATACAATTCTATTTCTATTGTATCTAAA 2222222222222222222222222222222222~2222222222222222222222222222222222~ NH:i:2 Pt_48506_49009_0:0:0_1:2:0_0 2421 Pt 48505 0 70M = 48505 0 AATTATTTACGTTCGCAATGTGCCTATGATGTGGCACCAGGTGGCCTCTTAGCCAGCTAGTATCATCTTACGACAA 2222222222222222222222222222222222222222222222222222222222222222222222222222 NH:i:2

Should I run on quasiindex/quasimap to see if it has a similar issue? That might help if those functions aren't having problems.

On Thu, Sep 17, 2015 at 12:33 PM, Tim Wall twall2010@gmail.com wrote:

Yes, still returns the same error about CIGAR length mismatch. I am currently trying on lower values of k to see what happens and I'll try to generate/send you a file where the error occurs early on.

On Thu, Sep 17, 2015 at 12:20 PM, Rob Patro notifications@github.com wrote:

Is this for the same reason, or is something different occurring? Do you have a small example dataset that reproduces this error?

— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141137999.

rob-p commented 9 years ago

Ok; strange --- that's the same bug I thought the previous commit fixed. Can you share the files (reference and reads) that produce this problem (via e-mail, dropbox, etc) so that I can test out the fix before I push it?

twall2014 commented 9 years ago

I used the files from here ( https://drive.google.com/file/d/0B-uC03eEUWJQMTROSXYxUkNmcFU/view?usp=sharing) but I imagine you could also reproduce the error with the much smaller files I sent you before.

On Thu, Sep 17, 2015 at 1:06 PM, Rob Patro notifications@github.com wrote:

Ok; strange --- that's the same bug I thought the previous commit fixed. Can you share the files (reference and reads) that produce this problem (via e-mail, dropbox, etc) so that I can test out the fix before I push it?

— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141152340.

rob-p commented 9 years ago

There was another path through the code that I didn't account for in the previous commit. This bug should be fixed now. I was able to successfully map data that previously exhibited this issue. Let me know if this fixes it for you and I will close this issue.

twall2014 commented 9 years ago

Seems to work now, thanks!

On Thu, Sep 17, 2015 at 4:18 PM, Rob Patro notifications@github.com wrote:

There was another path through the code that I didn't account for in the previous commit. This bug should be fixed now. I was able to successfully map data that previously exhibited this issue. Let me know if this fixes it for you and I will close this issue.

— Reply to this email directly or view it on GitHub https://github.com/COMBINE-lab/RapMap/issues/13#issuecomment-141215275.

rob-p commented 9 years ago

Great.it was good to squash this bug.