ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
521 stars 111 forks source link

Question about final gfa files from Minigraph-Cactus #1397

Open SimonaSecomandi opened 5 months ago

SimonaSecomandi commented 5 months ago

Hi all, I generated a pangenome with Minigraph-Cactus with this command:

 cactus-pangenome 3_MC_all/js-3_MC_all ./scripts/3_MC_all.seqfile \
--outName 3_MC_all \
--outDir 3_MC_all \
--reference hap2 hap1 \
--refContigs $(for i in $(seq 39); do printf "chr$i "; done ; echo "chrZ "chrW") \
--filter 3 \
--giraffe clip filter \
--vcf --vcfReference hap2 hap1 \
--viz --odgi --chrom-vg clip filter --chrom-og \
--gbz clip filter full \
--gfa clip full \
--logFile 3_MC_all/3_MC_all.log

I have these graphs in the main folder:

3_MC_all.full.gfa.gz  
3_MC_all.gfa.gz  
3_MC_all.sv.gfa.fa.gz  #1359 
3_MC_all.sv.gfa.gz

The full graph it the one you can use for visualization with odgi, but what is the 3_MC_all.gfa.gz graph? is this the clipped one?

I'm also wondering what to do if I want to analyze each chr separately. I have the chromosome graphs in the 3_MC_all.chroms subfolder, but they are only in full.og, .vg and .d3.vg formats. However, on the MC paper all the analysis (except short-read mapping) were performed on the clipped graph. So should I split the clipped 3_MC_all.gfa.gzin chrs? Or should I covert the 3_MC_all.chroms/chr.vg chromosome graph in gfa (assuming is clipped)? and why this is not done automatically?

And moreover, if I want to run a vg Giraffe test on one of the filtered chromosomes, should I generate indexes for 3_MC_all.chroms/chr.d3.vg, or is there another way?

Many thanks!! Simona

glennhickey commented 5 months ago

Yeah, there are 4 types of graphs, the name conventions apply to all files, but here's an example with .gfa.

So in your .chroms output, because you ran with --chrom-vg clip filter and --odgi (which is the same as --odgi full) you will get