MigleSur / GenAPI

Gene Absence Presence Identification tool.
GNU General Public License v3.0
26 stars 4 forks source link

Reference proteome GFF #14

Closed ireneortega closed 2 years ago

ireneortega commented 2 years ago

I would appreciate if you could add an option to use a proteome .gff file as reference in the gene_presenceabsence[name].txt file, so locustag are first included from that proteome. The same way as option -r in get_homologues

MigleSur commented 2 years ago

Dear Irene,

Thank you for this suggestion. It would require adding a translation from amino acid sequences to nucleotide sequences as GenAPI is based on nucleotide sequences. It is impossible to do that unambiguously since several triplets can code for the same amino acid. Therefore, it is not planned to include such an option in GenAPI for now. Maybe in the future, GenAPI could be tested and ran on protein sequence instead of nucleotide. Then, both nucleotide and protein input files could be accepted.

ireneortega commented 2 years ago

Sorry, I didn't mean to perform amino acids alignment. I was referring to use one genome as "reference" from all the being analysed to construct the pangenome, so locustag in the gene_presenceabsence[name].txt file are first added from that "reference" genome.