Closed tseemann closed 4 years ago
The script you are looking for is in /PIRATE/tools/subsetting/select_representative. It is a sensible thing to have as part of the core functionality so I will add a few options and make it part of the default PIRATE run. There isn't anything on it in the README and I will update that accordingly. I will let you know when it is done. Thanks :)
@SionBayliss ah ok... i see it now!
/home/linuxbrew/.linuxbrew/Cellar/pirate/1.0.3/libexec/tools/subsetting/select_representative
@tseemann I have moved select_representative from tools to scripts and made it a part of the default pipeline as of v1.0.4 (just released). It is updated slightly so you should be able to generate all of the outputs you were interested in from old datasets by rerunning the select_representative. There are a few other bug fixes and quality of life stuff in there as well, so it was a good incentive to release it.
Sion,
I was hoping to get a FASTA file out with one representative sequence per cluster but i can't seem to see one in the output or the Wiki output files.
ie. if there was 3000 clusters, a fasta file with the "best/longest" rep for each cluster
Ideally a .ffn (DNA) and .faa (AA) version.
Also, a pan, and a core (only clusters with all genomes in it) version
Have i missed something?