This PR turns it off by default in the configuration file. I ran two tests with Cactus v.2.8.0 with seeding on and off. chr10 from the the chm13-based v1.1 HPRC pangenome, and Anc08 (mouse/rat + outgroups) from the Zoonomia 10-way test.
The cactus_consolidated running times are
Seed Cons Time (s) BAR time (s) RAM (GB)
----------------------------------------------------
Anc08 Yes 36,543 20,478 406
Anc08 No 45,153 26,025 406
chr10 Yes 14,413 12,871 88
chr10 No 13,588 12,109 87
Coverage stats were unaffected for chr10 but for Anc08 they are a bit different -- turning seeding off increasing the coverage by 650kb (though rat self coverage goes down).
The 100 fewer minutes in the mouse rat alignment doesn't seem worth keeping seeding on. I think I'd initially enabled it to keep cloud costs down on large pangenome alignemnts even if accuracy was a bit lower, but seeding's only slowing things down on the chr10 test.
abPOA has an option to speed up alignment by using a minimizer-based seeding strategy to find anchors. But, in rare cases it can crash or, more worryingly, output a completely incorrect result -- which happens seems system-dependent.
This PR turns it off by default in the configuration file. I ran two tests with Cactus v.2.8.0 with seeding on and off.
chr10
from the the chm13-based v1.1 HPRC pangenome, andAnc08
(mouse/rat + outgroups) from the Zoonomia 10-way test.The
cactus_consolidated
running times areCoverage stats were unaffected for
chr10
but forAnc08
they are a bit different -- turning seeding off increasing the coverage by 650kb (though rat self coverage goes down).The 100 fewer minutes in the mouse rat alignment doesn't seem worth keeping seeding on. I think I'd initially enabled it to keep cloud costs down on large pangenome alignemnts even if accuracy was a bit lower, but seeding's only slowing things down on the chr10 test.