davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
705 stars 188 forks source link

diamond makedb error #603

Open wthomas14 opened 3 years ago

wthomas14 commented 3 years ago

Hi David,

Just leaving a minor issue here just in case anyone runs into it in the future. When running orthofinder -f primary_transcripts/ I get the error

ERROR: external program called by OrthoFinder returned an error code: 1 Command: diamond makedb --in /gpfs/scratch/withomas/primary_transcripts/OrthoFinder/Results_Aug13/WorkingDirectory/Species5.fa -d /gpfs/scratch/withomas/primary_transcripts/OrthoFinder/Results_Aug13/WorkingDirectory/diamondDBSpecies5 b'diamond v2.0.11.149 (C) Max Planck Society for the Advancement of Science\nDocumentation, support and updates available at http://www.diamondsearch.org\nPlease cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)\n\n#CPU threads: 144\nScoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)\nDatabase input file: /gpfs/scratch/withomas/primary_transcripts/OrthoFinder/Results_Aug13/WorkingDirectory/Species5.fa\nOpening the database file... [0.001s]\nLoading sequences... [0.003s]\nError: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed.\n

This error seems to come from a handful of proteins in the pruned ENSEMBL proteomes, that have exclusively amino acids Thr-Ala-Cys-Gly, that are being taken as DNA (ATCG). An example in the human proteome left behind by primary transcript.py >ENSG00000282431.1 GTGG

It seems like this error is occurring due to an update in diamond=2.0.11 (downloaded with Orthofinder v 2.5.4) - Added error message when reading protein sequences from FASTA files that only contain DNA letters (can be disabled using--ignore-warnings)

I was not able to disable this error in my Orthofinder workflow, and I could just prune each transcript file to remove these problematic sequences. I instead just reverted my diamond back to 2.0.9 in my environment. conda install diamond=2.0.9

Just figured I would post in case anyone else runs into this issue in the future! Thanks for all you do with this program, it is great!

Regards, Bill

iliapopov17 commented 2 months ago

The same works for Proteinortho!

With the latest version of DIAMOND it just crashes with exactly the same error:

Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed

But with DIAMOND v. 2.0.9 it works just fine Thanks for the tip!