Generating a Clipped/Filtered Graph from a {graph_name}.full.gbz file

EEEdyeah commented 23 hours ago

Hello,

I have {graph_name}.full.gbz files generated from Cactus Minigraph to map short-read data. According to the tutorial, a filtered graph is more efficient for vg giraffe. Additionally, I noticed that Haplotype Sampling is a better option for vg giraffe, requiring a clipped .gbz graph and a .hapl file.

How can I generate the filtered/clipped version of the graph and haplotypes without having access to the original input files?

I tried the following steps but am not sure if they are correct: (I only have full.hal full.gbz. full.gfa files) vg convert -g <(gunzip -c {graph_name}.full.gfa.gz) > {graph_name}.full.vg cactus-graphmap-join --vg {graph_name}.full.vg --hal {graph_name}.full.hal --giraffe clip ...

Any suggestions or corrections to this approach would be greatly appreciated. :)

zhangyixing3 commented 14 hours ago

Hi, I think it will help you a litter https://github.com/ComparativeGenomicsToolkit/cactus/issues/1469. I generated filtered graph from clipped graph and I wanted to perform vg giraffe with different filtered graph.

EEEdyeah commented 13 hours ago

@zhangyixing3 Thanks! I was wondering, why do you use [vg clip] multiple times instead of putting all the parameters into a single command? Also, I noticed the first vg clip used -d 2, while the second one used -d 1. What would happen if I wanted to use -d 10 instead?

zhangyixing3 commented 9 hours ago

As a matter of fact, I learned these parameters from the test log file. The first vg clip is used to remove regions you don't want, while the second vg clip is used to remove tips from the GFA file. If you use -d n instead of -d 2, you will remove nodes that do not contain at least n samples, and this will simplify the GFA file.

ComparativeGenomicsToolkit / cactus

Generating a Clipped/Filtered Graph from a {graph_name}.full.gbz file #1540