duncanca / mosaik-aligner

Automatically exported from code.google.com/p/mosaik-aligner
0 stars 0 forks source link

Problem aligning 454 reads to short (~8kb or less) assembly #67

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. I align 454 reads to a partial viral genome, having a contig of around 8kb
2. Most of the reads get filtered out, with only a very small fraction actually 
aligning (for example, 32 reads out of 11000)
3. If I fill the contig of 8kb with 2k X at the end (so that I have my contig 
followed by 2000 X), suddenly 6000 reads align to it, in the correct region (so 
nothing align against the X, they all align where they should be in the contig 
that was there in the first alignment).

I see this in many samples with short contigs or 2-8kb, all with the same 
result. When I align to my reference or a full viral genome of around 10kb, or 
the contig filled with X, the alignment is fine. When I align to the contig 
alone, it fails and aligns only a very small percentage of the reads.

I expect that it can be reproduced using a set of 454 reads and aligning 
against a short (~5kb) assembly of it. If not, I could likely provide files.

What version of the product are you using? On what operating system?

Mosaik-1.0.1388, on Unix

Please provide any additional information below.

Original issue reported on code.google.com by patcc...@hotmail.com on 19 Aug 2010 at 4:30

GoogleCodeExporter commented 8 years ago
After further testing, there seems to be a hard sequence length threshold where 
the alignment goes from good to bad.

It does not seem to be the same for all samples, but in my tests it seems 
around 8500 (in one case 8477, in the other 8541). With a base sequence length 
of 7880, if I added enough X to reach 8477 the alignment worked. If I removed 
one and the sequence dropped to 8476 the alignment failed. Modifying the hash 
size/act numbers does not seem to move the threshold.

Original comment by patcc...@hotmail.com on 19 Aug 2010 at 6:03

GoogleCodeExporter commented 8 years ago
Hi Patrick,

Thank you so much for helping us find this bug.

We realized that MOSAIK doesn't load references correctly.
And, it has been fixed in 1.1.0014 or greater.

Best,
Wan-Ping

Original comment by WanPing....@gmail.com on 17 Nov 2010 at 7:20