biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
555 stars 104 forks source link

sambamba depth works incorrectly #193

Closed timnat closed 8 years ago

timnat commented 8 years ago

I have a bam file 12GB. Run command sambamba_v0.5.9 depth base -t 32 LG2_to_WGS_merged.mq5.bam > LG2_to_WGS_merged_a_mq5_reads.base.samb_5.9.depth For some contigs output would have more bases than actual length of contigs and depth report would mark bases from actual alignment, but also some junk. Here is a particular example: sam file shows just 1 read aligned to this particular contig part of sambamba report SuperContig_95000193 50 13 0 0 1 12 0 0 SuperContig_95000193 51 13 12 0 1 0 0 0 SuperContig_95000193 52 13 12 0 0 1 0 0 SuperContig_95000193 53 13 12 1 0 0 0 0 SuperContig_95000193 54 13 13 0 0 0 0 0 SuperContig_95000193 55 13 0 0 13 0 0 0 SuperContig_95000193 56 13 0 1 12 0 0 0 SuperContig_95000193 57 13 1 0 0 12 0 0 SuperContig_95000193 58 13 1 0 0 12 0 0 SuperContig_95000193 59 13 12 0 1 0 0 0 SuperContig_95000193 60 13 0 1 0 0 12 0 SuperContig_95000193 61 13 1 0 0 0 12 0 SuperContig_95000193 62 12 0 0 0 0 12 0 SuperContig_95000193 63 12 0 0 0 0 12 0

Bases marked with 1 are actual, marked with 12,13 are foreign. I also noticed that usually insertion presents for such erroneous reports. bedtools genomecov produces correct report for the same test. If doing alignment of all reads to only this particular contig, depth is counted correctly. It looks like that samabamba depth mixes some contigs (may be too long names???) Let me know if you need more details on the issue. Thank you

lomereiter commented 8 years ago

Hi, could you share the files? It's also worthwhile to extract reads aligned to both previous and this contig using view tool and see if the problem persists.

timnat commented 8 years ago

Hi Artem,

In the attachment there are three files:

  1. *bam
  2. extraction from sambamba output for one particular contig SuperContig_95000193, where error happened
  3. extraction from bedtools genomecov output for the same contig

Alignment extracted from bam file looks like this HWI-ST1309F:134:C4UNWACXX:4:1110:11212:40875 81 SuperContig_95000193 111 2S62M5S SuperContig_3089407 1 0 AGGGGAGAAGAGAGGAAAGGAGGAGAAAGAGGGAGGGGGGTAGCTGAAGTAGGGTCAGCAAGCATAGAT DDDDDDDDDDDDDDDDDDDDDDDDDDD;7;5E8EGHFB)?IHDB?BIGHDGGHEIHDDDGDDFDDFDD@ NM:i:3 MD:Z:32C3A10G14 AS:i:47 XS:i:41

Notice it is not the only Contig where that error happens, there are others

Let me know if you need any other information. Sambamba is a great tool, so will be happy to assist

Thank you, Nataliya

2016-03-28 14:44 GMT-04:00 Artem Tarasov notifications@github.com:

Hi, could you share the files? It's also worthwhile to extract reads aligned to both previous and this contig using view tool and see if the problem persists.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/lomereiter/sambamba/issues/193#issuecomment-202525071

lomereiter commented 8 years ago

@timnat Sorry but your e-mail didn't have any files attached, the files are also likely to exceed e-mail server size limits. Could you upload them to a FTP or similar and provide a link? Thanks.

timnat commented 8 years ago

Yes, I didn't pay attention on the fact that file is huge. Please, try do download from here. https://drive.google.com/a/g.uky.edu/file/d/0B-j17DxjqV0xalFIYnlPMFdUU2s/view?usp=sharing

2016-04-05 13:02 GMT-04:00 Artem Tarasov notifications@github.com:

@timnat https://github.com/timnat Sorry but your e-mail didn't have any files attached, the files are also likely to exceed e-mail server size limits. Could you upload them to a FTP or similar and provide a link? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/lomereiter/sambamba/issues/193#issuecomment-205896390

timnat commented 8 years ago

Were you able to download the file?

2016-04-05 16:50 GMT-04:00 Nataliya Timoshevskaya timnatevg@gmail.com:

Yes, I didn't pay attention on the fact that file is huge. Please, try do download from here.

https://drive.google.com/a/g.uky.edu/file/d/0B-j17DxjqV0xalFIYnlPMFdUU2s/view?usp=sharing

2016-04-05 13:02 GMT-04:00 Artem Tarasov notifications@github.com:

@timnat https://github.com/timnat Sorry but your e-mail didn't have any files attached, the files are also likely to exceed e-mail server size limits. Could you upload them to a FTP or similar and provide a link? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/lomereiter/sambamba/issues/193#issuecomment-205896390

lomereiter commented 8 years ago

No, I've just resent a request for access, please approve it.

timnat commented 8 years ago

I just shared file with you from the google drive

And here is a shareable link https://drive.google.com/open?id=0B-j17DxjqV0xalFIYnlPMFdUU2s

Please, let me know, if you were able to get it or not

2016-04-18 12:59 GMT-04:00 Artem Tarasov notifications@github.com:

No, I've just resent a request for access, please approve it.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/lomereiter/sambamba/issues/193#issuecomment-211474215

lomereiter commented 8 years ago

I'm downloading it now, thanks!

lomereiter commented 8 years ago

Thanks again, I simplified the test case down to just three contigs, and the issue turned out to be that the read following the single one mapped to 95000193 has the same position but different reference, and the corresponding check was missing.

timnat commented 8 years ago

Good, quick job. So does it mean, it will be fixed in the next version of sambamba?

2016-04-19 16:22 GMT-04:00 Artem Tarasov notifications@github.com:

Thanks again, I simplified the test case down to just three contigs, and the issue turned out to be that the read following the single one mapped to 95000193 has the same position but different reference, and the corresponding check was missing.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/lomereiter/sambamba/issues/193#issuecomment-212110006

lomereiter commented 8 years ago

Yes, that's correct. I'm planning to make a new release this or next week.