cbg-ethz / haploclique

Viral quasispecies assembly via maximal clique finding. A method to reconstruct viral haplotypes and detect large insertions and deletions from NGS data.
GNU General Public License v3.0
25 stars 33 forks source link

Doubt about an implementation detail in Haploclique #46

Open shounak1990 opened 7 years ago

shounak1990 commented 7 years ago

Hi, I am working on my master thesis where I am modifying the edge criteria in Haploclique to Identify nucleosome patterns. I am not explaining the complete edge criteria here since the question is not directly related to that.

However I have a doubt in the part where the reads from a clique are merged to form superreads. Does the algorithm use a consensus for each position while merging?

I am talking about the class AlignmentRecord.cpp and the method AlignmentRecord::AlignmentRecord(...){....}

There is a for loop and it looks like the reads are merged sequentially as pairs of two and not a consensus of all the reads that are present in the clique.

I would appreciate if you point out where the consensus merging takes place.

Thanks and Regards Shounak

MaryamZaheri commented 7 years ago

Hi Shounak,

Yes, the alignments in a clique are merged incrementally, and at the end the final super-read is reconstructed. This is a modification on the original method in the paper. This way it is guaranteed that compatible deletions and insertions can be found in the final super-read alignment.

shounak1990 commented 7 years ago

Hi Maryam,

Thank you for your reply. Could you also tell me if there is a big impact on individual nucleotides in the super reads because of this? For example is there not a chance of an error nucleotide showing up in the final merged read just because the original read it came from was merged earlier that the rest?

I would also like to know if there is a way to revert back to the consensus method without changing the current merging process.

Thanks, Shounak

tobiasmarschall commented 7 years ago

Hi Maryam,

my understanding was that reads with different indels are never part of the same clique. So for all non-indel columns in the multiple alignment compute the consensus should be the right thing to do. I think Shounak's question is relevant: Is the merging implemented in a way such that the result is independent of the order of reads being processed?

Best, Tobias