ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Refactor VCF export #1491

Closed glennhickey closed 9 hours ago

glennhickey commented 1 month ago

This switches VCF export from using whole-genome GBZ to chromosome VG files. This is necessary to plug in the nested VCF support, at least as currently implemented, which requires all paths being indexed.

I don't think it'll affect runtime very much -- maybe on a cluster it could go faster because VCF export now distributed by chromosome. But it could potentially use a lot more memory with --noSplit because you'd be working with a whole-genome vg instead of gbz -- but I think this use case remains pretty obscure.

Right now the interface still requires vcf reference samples be selected as references via the --reference option, but this is no longer necessary since we're not going through GBZ. So it would be a small future change to lift this requirement, enabling VCF creation on any sample in the graph.