Open maxgmarin opened 3 years ago
Additionally, I noticed that for the SV genotyping paper (https://github.com/vgteam/sv-genotyping-paper), hal2vg was used to output a .vg format.
The rule used for that paper can be found from the snakemake file used here
rule hal2vg:
input:
"cactusoutput.hal"
output:
"yeast.vg"
shell:
"~/bin/hal2vg_fork/hal2vg --noAncestors --refGenome S288C {input} > {output}"
hal2vg
output works fine with vg
. Perhaps you have an outdated version of one of them?
example:
wget https://github.com/ComparativeGenomicsToolkit/cactus/releases/download/v1.2.3/cactus-bin-v1.2.3.tar.gz
tar xf cactus-bin-v1.2.3.tar.gz
wget https://github.com/vgteam/vg/releases/download/v1.30.0/vg
chmod +x vg
cactus-bin-v1.2.3/bin/halRandGen rand.hal
cactus-bin-v1.2.3/bin/hal2vg rand.hal > rand.vg
./vg paths -Ev rand.vg
Genome_16.Genome_16_seq 436644
Genome_13.Genome_13_seq 223992
Genome_14.Genome_14_seq 112716
Genome_3.Genome_3_seq 150518
Genome_9.Genome_9_seq 161504
Genome_17.Genome_17_seq 242916
Genome_2.Genome_2_seq 284130
Genome_19.Genome_19_seq 154714
Genome_18.Genome_18_seq 585488
Genome_8.Genome_8_seq 720948
Genome_0.Genome_0_seq 141933
Genome_6.Genome_6_seq 196470
Genome_4.Genome_4_seq 139629
Genome_15.Genome_15_seq 572700
Genome_10.Genome_10_seq 771630
Genome_12.Genome_12_seq 219286
Genome_11.Genome_11_seq 752640
Genome_7.Genome_7_seq 853905
Genome_1.Genome_1_seq 828696
Genome_5.Genome_5_seq 476136
halStats rand.hal
hal v2.1
(((Genome_14:0,Genome_15:0)Genome_9:0)Genome_1:0,((Genome_16:0)Genome_10:0)Genome_2:0,Genome_3:0,Genome_4:0,(Genome_11:0)Genome_5:0,Genome_6:0,((Genome_17:0,Genome_18:0)Genome_12:0)Genome_7:0,((Genome_19:0)Genome_13:0)Genome_8:0)Genome_0;
GenomeName, NumChildren, Length, NumSequences, NumTopSegments, NumBottomSegments
Genome_0, 8, 141933, 1, 0, 253
Genome_1, 1, 828696, 1, 1478, 473
Genome_9, 2, 161504, 1, 93, 196
Genome_14, 0, 112716, 1, 137, 0
Genome_15, 0, 572700, 1, 696, 0
Genome_2, 1, 284130, 1, 507, 210
Genome_10, 1, 771630, 1, 571, 445
Genome_16, 0, 436644, 1, 252, 0
Genome_3, 0, 150518, 1, 269, 0
Genome_4, 0, 139629, 1, 249, 0
Genome_5, 1, 476136, 1, 849, 389
Genome_11, 0, 752640, 1, 615, 0
Genome_6, 0, 196470, 1, 351, 0
Genome_7, 1, 853905, 1, 1523, 435
Genome_12, 2, 219286, 1, 112, 166
Genome_17, 0, 242916, 1, 184, 0
Genome_18, 0, 585488, 1, 444, 0
Genome_8, 1, 720948, 1, 1286, 438
Genome_13, 1, 223992, 1, 137, 183
Genome_19, 0, 154714, 1, 127, 0
Hello,
I have successfully ran both the Cactus pipeline on a set of ~30 bacterial genomes, as well as on the mammals example data. Both times Cactus has output a .hal file that I can inspect and validate.
I was able to use hal2vg to convert the .hal alignment to the .pg format.
The issue I have run into is that none of the functionalities of vg (latest version 1.30) seem to accept the .pg format output by hal2vg.
Is there a straightforward way to convert .pg to .vg?
Alternatively, should I look into outputting a .odgi with hal2vg, then converting the .odgi to .gfa (and then .gfa to .vg)?
Thank you, Max