ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
481 stars 106 forks source link

Minigraph-Cactus: Details on the .gfa.fa.gz output file #1357

Closed VLoegler closed 2 months ago

VLoegler commented 2 months ago

Hi,

I'm using the Minigraph-Cactus pipeline, and I'm wondering to what corresponds the file ending with .gfa.fa.gz. This file contains sequences of graph segments, but these segments are much longer than the one found in the final gfa file and much less numerous. Also, the gfa.fa.gz file was generated earlier in the MC pipeline than the final gfa file.

Could you tell me to what this file corresponds ? Is it the segments sequences of the initial Minigraph pan genome ?

glennhickey commented 2 months ago

Is it the segments sequences of the initial Minigraph pan genome ?

Yes, that's exactly what it is. It's an intermediate file whose only use, really, is to make it a bit easier to rerun pieces of the pipeline. I'll rename it to .sv.gfa.fa.gz which should make it a bit clearer.