The graph length of pangenome growth, especially when coverage =1 in panacus program.

ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs

Other

523 stars 111 forks source link

I'm glad people are using panacus! I intend on eventually including it in the Cactus release and maybe running some of the growth curves automatically -- they are much nicer than the plots I've been making.

Anyway... the --reference genome is always included in its entirety in the output. Here's an example of how to verify this using vg and samtools on the S288C reference in the yeast example, which you can try on your data:

vg paths -Ex yeast.gbz | grep S288C | awk '{sum += $2} END {print sum}'
12157149
samtools faidx S288C.fa.gz
cat S288C.fa.gz | awk '{sum += $2} END {print sum}'
12157149

Perhaps you are misinterpreting the panacus output? I agree that 620Mb with coverage >= 1 would not make sense with your reference size. But with coverage == 1, it does hold up (ie 620Mb only present in one sample, which is a number that is not bounded by the reference length in any way).

ComparativeGenomicsToolkit / cactus

The graph length of pangenome growth, especially when coverage =1 in panacus program. #1113