Format of assembly files

chadlaing / Panseq

Pan-genomic sequence analysis

http://lfz.corefacility.ca/panseq

GNU General Public License v3.0

43 stars 14 forks source link

Format of assembly files #5

Closed Sandramses closed 10 years ago

Sandramses commented 10 years ago

I am trying to run panseq on 30 genome assemblies. I have each isolate as a multifasta file containing the scaffolds of that assembly (with unique names). When I run panseq it sees each scaffold as a genome and tells me i have >4000 genomes and not 30. Does the multifasta file need to be formatted in a particular way for the program to understand that each file is one genome? Have I missed something?

chadlaing commented 10 years ago

Hi, The multi-fasta files do need to be formatted so that there is a unique identifier for each genome, otherwise what you are experiencing will happen (each contig as its own genome).

This is detailed here: https://lfz.corefacility.ca/panseq/faq/

And will be added to the Github README soon.

Thanks,

Chad