Open niconps14 opened 2 months ago
Hi niconps14, are you able to provide a sequence file and corresponding gene tree file where this is an issue? and the command and options that you used to run OrthoFinder? Laurie
Hi Laurie, here is a sequences file and corresponding tree file: Summer Work.zip
And here is the command I used: orthofinder -f /ihome/biosc1542_2024s/nps42/OrthoFinder/RealMycoProteomes
for reference, one of the sequences that shows up twice in that file is this: >WP_007172343.1 MFVEALPAGIDASASGLAGIGAAVAAGNAAGAAPTLGVVPPAPEPASVLLAAAFGTHAGVYQATQAIGEVVHQMFVSTMG ISSADYAATEVLNTAAMV
For that example, the same protein ID appears twice because that protein ID appears in two species (in this case Mycobacterium), so it is also in the gene tree twice (once for each species) https://www.uniprot.org/uniprotkb?query=WP_007172343.1
In many of my Orthogroup sequence files, within the same orthogroup I am getting duplicate copies of the same exact gene, with identical identifiers and amino acid sequences. My original fasta files that I input to the program does not have these duplicates, nor do the trees corresponding with the orthogroups. Does anyone know what could be causing this?