ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Sort flowers by total-base-length instead of cap-number for bar/ref #1514

Closed glennhickey closed 2 weeks ago

glennhickey commented 3 weeks ago

Flowers are sorted by decreasing size when doing the parallel loops in BAR and Reference phases so longer jobs get queued first. But the size is measured by flower_getCapNumber() and on a small test of chrI of the yeast pangenome, the 3 jobs that stand out as longest running don't have the most caps. But they do have the longest flower_getTotalBaseLength().

Granted, this isn't a ton to go on, but this PR tries switching up the sort to use base length instead of caps. And because base length is more involved to compute, it's done in parallel.

The idea being to use this branch to run a couple big tests to see if it helps with wall time for some big, parallel, jobs. It's kind of a long shot, but runtime does seem to be bounded by a single long job for many cases, so anything to make sure it gets queued right away should lead to a measurable gain...