Make graphalign return all possible paths, in order of score.

ctb commented 9 years ago

For both debugging and Science, it would be interesting if the read aligner could yield multiple paths, starting with the best possible and then going down from there. I would imagine that some kind of generator approach would be a natural fit, since you clearly don't want to return a set of all possible paths :)

Not having fully internalized the read aligner code, I don't have a sense for what the right approach is. One option might be to feed it a set of paths that it should avoid in alignment, which can be iteratively extended as new "good" paths are found. An alternative might be to keep an internal list of good paths tried, sorted by score. I can imagine both approaches would be useful in different circumstances.

The uses are manifold:

SNP/variant calling (right now the read aligner just picks one of multiple alleles)
debugging ("hey, why didn't this path get picked? why does it score lower than this other... oh.")
if "avoid this path" used, further tailoring of paths to pick paths through specific tags.
probably more.

It just seems like nice flexibility.

@camillescott @luizirber @mr-c @fishjord any thoughts?

mr-c commented 9 years ago

A generator would be tough as you don't know the best path until the end. Would require a bit of new plumbing as 'not-best' paths are discarded as we go.

ctb commented 9 years ago

On Thu, Jul 30, 2015 at 11:08:17AM -0700, Michael R. Crusoe wrote:

A generator would be tough as you don't know the best path until the end. Would require a bit of new plumbing as 'not-best' paths are discarded as we go.

Random thought - what if we started with the best path and looked for next-worst path?

mr-c commented 9 years ago

As in: do the alignment, record the best path; redo the alignment and stop just prior to that outcome?

On Fri, Jul 31, 2015 at 7:29 AM C. Titus Brown notifications@github.com wrote:

On Thu, Jul 30, 2015 at 11:08:17AM -0700, Michael R. Crusoe wrote:

A generator would be tough as you don't know the best path until the end. Would require a bit of new plumbing as 'not-best' paths are discarded as we go.

Random thought - what if we started with the best path and looked for next-worst path?

— Reply to this email directly or view it on GitHub https://github.com/dib-lab/khmer/issues/1114#issuecomment-126708806.

Michael R. Crusoe: Programmer & Bioinformatician crusoe@ucdavis.edu The lab for Data Intensive Biology; University of California, Davis https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

ctb commented 9 years ago

On Fri, Jul 31, 2015 at 07:46:20AM -0700, Michael R. Crusoe wrote:

As in: do the alignment, record the best path; redo the alignment and stop just prior to that outcome?

That's one way - I was thinking that if we started with best path, we might be able to find close but not as good paths easily. But ... maybe not.

dib-lab / khmer

Make graphalign return all possible paths, in order of score. #1114