Representative pan genes FASTA ?

SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.

GNU General Public License v3.0

88 stars 29 forks source link

Representative pan genes FASTA ? #41

Closed tseemann closed 4 years ago

tseemann commented 4 years ago

Sion,

I was hoping to get a FASTA file out with one representative sequence per cluster but i can't seem to see one in the output or the Wiki output files.

ie. if there was 3000 clusters, a fasta file with the "best/longest" rep for each cluster

Ideally a .ffn (DNA) and .faa (AA) version.

Also, a pan, and a core (only clusters with all genomes in it) version

Have i missed something?

SionBayliss commented 4 years ago

The script you are looking for is in /PIRATE/tools/subsetting/select_representative. It is a sensible thing to have as part of the core functionality so I will add a few options and make it part of the default PIRATE run. There isn't anything on it in the README and I will update that accordingly. I will let you know when it is done. Thanks :)

tseemann commented 4 years ago

@SionBayliss ah ok... i see it now! /home/linuxbrew/.linuxbrew/Cellar/pirate/1.0.3/libexec/tools/subsetting/select_representative

SionBayliss commented 4 years ago

@tseemann I have moved select_representative from tools to scripts and made it a part of the default pipeline as of v1.0.4 (just released). It is updated slightly so you should be able to generate all of the outputs you were interested in from old datasets by rerunning the select_representative. There are a few other bug fixes and quality of life stuff in there as well, so it was a good incentive to release it.