davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
704 stars 188 forks source link

Single copy orthologs (SCO) are not being generated #908

Closed timurrxxcd closed 3 months ago

timurrxxcd commented 4 months ago

Hi! I have a protein file of whole genome of a perennial grass. I downloaded 10 plant species from NCBI and used orthofinder to extract SCO. they are generating SCO when I remove my species protein file. But if I add my species protein file to them and run orthofinder no SCO were found. Then I extracted proteins from haploid genome assembly using longest mRNA GFF3 utilizing GFFread tool (gffread -g genome.fasta -y ProteinsOutput.faa final.gene.mRNA.longest.gff3). And ran orthofinder including my species, and it found 303 SCO (interesting). My question is that why it didn't find any SCO when I added my species initial protein file but it found when I extracted protein from haploid genome? is this way correct?

davidemms commented 3 months ago

Hi

I think it likely that your original proteome file contained multiple transcripts per gene. Orthofinder expects a single representative sequence per gene in order to carry out this single copy analysis.

All the best David