SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
89 stars 29 forks source link

[Feature] Make output file with original gene designations from GFF files #5

Closed dutchscientist closed 5 years ago

dutchscientist commented 5 years ago

I see the genes have been renumbered by PIRATE. I use Prokka to number the genes and use a 10-step (_00010, _00020 etc), but PIRATE renumbers these and adds the full name to it.

Example: genome Ec2456_phyloC, gene designations Ec2456_00010 etc

Renamed to: genome Ec2456_phyloC, gene designations Ec2456_phyloC_00001 etc

Is it possible to instruct PIRATE not to renumber or rename, and work with the original codes from the GFF files?

dutchscientist commented 5 years ago

or is this what subsample_outputs.pl does?

SionBayliss commented 5 years ago

Apologies for the late reply. Yes this is what subsample_outputs.pl is intended for. You can pass the --field flag with the "prev_locus" argument in order to retrieve the original locus tags from the input files. I have been toying with this being part of the final steps in the pipeline but it has some unintended consequences for some of support scripts.

dutchscientist commented 5 years ago

Thanks!