davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
653 stars 185 forks source link

Duplicates of the same gene in Orthogroup sequence files #901

Open niconps14 opened 2 months ago

niconps14 commented 2 months ago

In many of my Orthogroup sequence files, within the same orthogroup I am getting duplicate copies of the same exact gene, with identical identifiers and amino acid sequences. My original fasta files that I input to the program does not have these duplicates, nor do the trees corresponding with the orthogroups. Does anyone know what could be causing this?

lauriebelch commented 2 months ago

Hi niconps14, are you able to provide a sequence file and corresponding gene tree file where this is an issue? and the command and options that you used to run OrthoFinder? Laurie

niconps14 commented 2 months ago

Hi Laurie, here is a sequences file and corresponding tree file: Summer Work.zip

And here is the command I used: orthofinder -f /ihome/biosc1542_2024s/nps42/OrthoFinder/RealMycoProteomes

for reference, one of the sequences that shows up twice in that file is this: >WP_007172343.1 MFVEALPAGIDASASGLAGIGAAVAAGNAAGAAPTLGVVPPAPEPASVLLAAAFGTHAGVYQATQAIGEVVHQMFVSTMG ISSADYAATEVLNTAAMV

lauriebelch commented 2 months ago

For that example, the same protein ID appears twice because that protein ID appears in two species (in this case Mycobacterium), so it is also in the gene tree twice (once for each species) https://www.uniprot.org/uniprotkb?query=WP_007172343.1