Open wangnan9394 opened 1 year ago
I'm glad people are using panacus! I intend on eventually including it in the Cactus release and maybe running some of the growth curves automatically -- they are much nicer than the plots I've been making.
Anyway... the --reference
genome is always included in its entirety in the output. Here's an example of how to verify this using vg
and samtools
on the S288C reference in the yeast example, which you can try on your data:
vg paths -Ex yeast.gbz | grep S288C | awk '{sum += $2} END {print sum}'
12157149
samtools faidx S288C.fa.gz
cat S288C.fa.gz | awk '{sum += $2} END {print sum}'
12157149
Perhaps you are misinterpreting the panacus output? I agree that 620Mb with coverage >= 1 would not make sense with your reference size. But with coverage == 1, it does hold up (ie 620Mb only present in one sample, which is a number that is not bounded by the reference length in any way).
Hi,
I'm using cactus-minigraph following the workflows of pangenome: https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md The single reference haplotype is about 800Mb and I have generated a graph pangenome (GFA). When I tested the pangenome growth using panacus program with a input sample list (the first sample is the single reference haplotype), I found the coverage =1 indicates 610 Mb sequences. I am not sure why the legth of first underlying graph is not equal to the length of singlr reference haplotype (800 Mb). Does 190Mb sequences were clipped in the graph?
Thank you so much.
Nan