ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
499 stars 109 forks source link

How compare mapping performance using vg-giraffe between pangenome and genomes used in pangenome construction #1431

Open isaacambrogetti opened 2 months ago

isaacambrogetti commented 2 months ago

Hello,
I am struggling with how to find a way to replicate what you've done in the cactus paper, where GRCh38 and CHM13 genomes have been used to create two "linear pangenomes" (that is pangenomes containing only the reference genome) in order to compare their mapping performance against the pangenome ones. From what I understood, in the HPCR example, the two linear genomes got extracted from the pangenome itself, and they got to be linear because they has been set as reference while constructing the pangenome, am I correct?
Without this step of extraction of the linear genomes from the pangenome, it will never be possible to map to the original genomes FASTA file (I mean the ones used to create the pangenome) with vg-giraffe.

So far I created my pangenome with 3 genomes of Biscutella laevigata using one of them as reference, mapped reads on it with vg-giraffe, and now I want to be able to map the same reads to the genomes I used to create the pangenome while still using vg-giraffe to have a measure of comparison of how the mapping quality varies from pangenome to the single reference genomes. In my situation it would be only possible to extract the only genome I used as reference to use it for mapping with vg-giraffe? If I want to map on all the 3 genomes I used to construct the pangenome to compare the mapping performance between the 4 of them (3 genomes and 1 pangenome), which tools are have similar performance to vg-giraffe but that map against FASTA files? So that the tools used influence the least the comparison outcome?

Thank you for the good work and the support you're giving. Best, Isaac

glennhickey commented 1 month ago

To make a linear pangenome, your best bet is to use vg autoindex. Ex

vg autoindex -p linear_pangenome -w giraffe -r REF.fa

where REF.fa is the fasta file for the genome you want to index...