ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
520 stars 111 forks source link

Preparing genome fasta headers before running Cactus #296

Open brettChapman opened 4 years ago

brettChapman commented 4 years ago

Hi

I was wondering if before running Cactus on my 20 genomes I should rename the fasta headers to uniquely identify them, say with the variety name prefixed? Across all 20 genomes the fasta headers are chr1H...chr7H and chrUn.

The resulting HAL file will then be processed by Seqwish and then into VG.

Thanks.

glennhickey commented 4 years ago

When exporting from hal you can choose whether or not to prepend genome names to the fasta sequence names, so there shouldn't be any need to change your input.

With hal2vg and hal2paf prepending is on by default and disabled with --onlySequenceNames. But in hal2fasta, it is off by default and turned on with --ucscSequenceNames. A bit more info here: https://github.com/ComparativeGenomicsToolkit/hal#pangenome-graph-export-gfa-and-vg