Open donkirkby opened 4 years ago
First impressions of minimap2:
None
return value instead of raising an exception.For local alignment of two long consensus sequences, assuming one of them spans a range of the other (amplicon vs. reference), Smith-Waterman or Gotoh are sound choices. I have never used minimap2, but maybe the PacBio or Nanopore features for handling long reads would approximate this use case. I don't know what else those features would do, though.
At 30k bp, SARS-CoV-2 is going to stress a SW implementation that builds an entire N x M backtracking matrix. A useful optimization for SW space complexity is banding.
The rust-bio library has a good API for this: https://docs.rs/bio/0.20.3/bio/alignment/pairwise/banded/index.html which would be a good test bed for working out the alignment parameters.
As I've been working on #549 to add support for SARS-CoV-2 references, I've had some trouble with running out of memory. I think it's partly that I'm running on equipment with less memory than I usually use, and partly that the SARS-CoV-2 genome is longer than HIV or HCV. The specific step that I've had most trouble with is aligning two consensus sequences using our Gotoh algorithm, so maybe it's time to look at alternatives.
@jeff-k had suggested we move from Gotoh to BWA, and that project seems to have been superceded by minimap2. Experiment with these tools for aligning the SARS-CoV-2 consensus sequences, and then decide whether they are worth switching to.
Tasks
Use the same minimap2 alignment to clip out gene regions.Tracked in issue #479.