SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
91 stars 29 forks source link

Output Files #66

Closed JChristopherEllis closed 3 years ago

JChristopherEllis commented 3 years ago

Is there an output file that contains only the core sequences in fasta file format?

SionBayliss commented 3 years ago

Hi Chris,

PIRATE create a core alignment (-a option) in multifasta format. Core is defined as gene families present in >95% isolates. It also produces a representative sequences multifasta if you wish to subset this to generate one sequence per core gene.

All the best, Sion

JChristopherEllis commented 3 years ago

Thank you Sion for such a quick response. The problem with the alignment fasta file is well it is an alignment and does not lend it self to down stream processing. So, I was hoping there as a core pan genome fasta file output as well.

SionBayliss commented 3 years ago

Hi Chris,

You could easily subset the representative_sequences.fasta using an approach detailed here - https://www.biostars.org/p/319099/. This would also allow you to set the core threshold, should the 95% value be unsuitable for your application.

All the best, Sion