dotnetbio / bio

Bioinformatics library for .NET
Apache License 2.0
144 stars 49 forks source link

Order & Orientation of Sequences matters for DeNovo Assembler #42

Open robinemig opened 4 years ago

robinemig commented 4 years ago

I've seen a situation pop up numerous times where the length of the final consensus sequence when calling the following code, changes, if the sequences are ordered from longest to shortest and vice versa, or if the orientation of the sequences changes Bio.Algorithms.Assembly.OverlapDeNovoAssembler assem = new Bio.Algorithms.Assembly.OverlapDeNovoAssembler(); assem.OverlapAlgorithm.GapOpenCost = -10; assem.OverlapAlgorithm.GapExtensionCost = -2; assem.OverlapAlgorithm.SimilarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.AmbiguousDna); var assembly = assem.Assemble(reads) as Bio.Algorithms.Assembly.OverlapDeNovoAssembly;

compare by assembly.Contigs.First().Consensus.Count Ive tried to make some simulated data to provide a test case, but can't seem to find one that works. but I can verify it does this with as little as two sequences