ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
528 stars 111 forks source link

[experimental] FastGA support #1459

Open glennhickey opened 3 months ago

glennhickey commented 3 months ago

FastGA is a new pairwise genome aligner that seems like a good candidate to help speed up Cactus.. This PR adds an option to drop it into progressive cactus as a lastz replacement.

It's got a ways to go before merging though, as it doesn't yet pass the evolver mammals test. Issues so far:

Without having spent much time on this, it looks like FastGA does not work well with small contigs, at least with its default parameters. This leads to trouble with trimmed and ancestral sequences in Cactus.

This branch should still be runnable on pairwise alignments in Cactus, and pairwise tests are probably the next step before seeing how much it's worth pursuing the above issues.

glennhickey commented 3 months ago

make evolver_test_poa_local (primates star tree) fails with

Comparing mafcomp accuracy 0.980491,0.980145 to baseline accuracy 0.998757,0.985563 with threshold (0.0025, 0.0075)

make evolver_test_local (mammals progressive) fails with

Comparing mafcomp accuracy 0.749021,0.290327 to baseline accuracy 0.894622,0.706771 with threshold (0.05, 0.13)

When I switch to a star tree for the mammals

Comparing mafcomp accuracy 0.840356,0.336892 to baseline accuracy 0.894622,0.706771 with threshold (0.05, 0.13)

which means the divergence rather than ancestor alignements seems to be the driving force for most of the recall drop. so could be hope for improvement via tuning parameters

glennhickey commented 3 months ago

@benedictpaten super low priority, but paffy chain always gives 0 scores to fastga alignments for reasons I don't quite see (but are probably pretty obvious). File to reproduce here: http://public.gi.ucsc.edu/~hickey/debug/fastga-chaining/