epi2me-labs / wf-amplicon

Other
19 stars 5 forks source link

Understanding read pre-processing prior to SPOA #6

Closed GeorgiaBreckell closed 8 months ago

GeorgiaBreckell commented 10 months ago

Ask away!

Hi, Hoping this is a simple question and I have missed something in my understanding of SPOA, but I noticed you mention the read length and read order is important for SPOA where as this isn't covered on the SPOA GitHub. Was this something you found during the development of this tool? Additionally I noticed that you interleave the reads prior to assembly, could you please explain the reasoning behind this? Finally SPOA is run twice, I am assuming based on how the first consensus is used that the first read is given the most weight in consensus building. Was iteratively running SPOA more beneficial than polishing and did 3rd and subsequent rounds of SPOA offer any additional benefit?

Thanks for a great tool! Georgia

cjw85 commented 10 months ago

For a primer on partial order alignment I would suggest reading https://simpsonlab.github.io/2015/05/01/understanding-poa/; its more approachable than the original papers. With a bit of thought I think you'll get a feel for how the algorithm can suffer from being order dependent.

All the tricks around how reads are order d and how POA is performed multiple times flow from this observation.