Previously, you needed to run --hapl --giraffe clip to make the new haplotype subsampling index. This is because the .dist index generated by --giraffe is a requirement for making the .hapl index.
But making the .dist index of the clip graph in this way (as opposed to the filter graph) could take loads of time and memory. And, I just found out, vg haplotypes doesn't actually need a full distance index: it can get by on a top-level index constructed with vg index --snarl-limit 1.
For hprc-v1.1-mc-chm13.dist, the savings are substantial by using this option.
vg index hprc-v1.1-mc-chm13.xg -j top.dist --snarl-limit 1"
Elapsed (wall clock) time (h:mm:ss or m:ss): 57:22.61
Maximum resident set size (kbytes): 79057736
vg index hprc-v1.1-mc-chm13.xg -j default.dist"
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:54:50
Maximum resident set size (kbytes): 258998352
This PR allows you to run --hapl without --giraffe. In this case, only the top-level distance index is created. It is used to make the .hapl index then thrown away. This removes a major memory bottleneck especially on large diverse graphs.
Previously, you needed to run
--hapl --giraffe clip
to make the new haplotype subsampling index. This is because the.dist
index generated by--giraffe
is a requirement for making the.hapl
index.But making the
.dist
index of the clip graph in this way (as opposed to the filter graph) could take loads of time and memory. And, I just found out,vg haplotypes
doesn't actually need a full distance index: it can get by on a top-level index constructed withvg index --snarl-limit 1
.For
hprc-v1.1-mc-chm13.dist
, the savings are substantial by using this option.This PR allows you to run
--hapl
without--giraffe
. In this case, only the top-level distance index is created. It is used to make the.hapl
index then thrown away. This removes a major memory bottleneck especially on large diverse graphs.