Open wthomas14 opened 3 years ago
The same works for Proteinortho
!
With the latest version of DIAMOND
it just crashes with exactly the same error:
Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed
But with DIAMOND
v. 2.0.9 it works just fine
Thanks for the tip!
Hi David,
Just leaving a minor issue here just in case anyone runs into it in the future. When running
orthofinder -f primary_transcripts/
I get the errorERROR: external program called by OrthoFinder returned an error code: 1
Command: diamond makedb --in /gpfs/scratch/withomas/primary_transcripts/OrthoFinder/Results_Aug13/WorkingDirectory/Species5.fa -d /gpfs/scratch/withomas/primary_transcripts/OrthoFinder/Results_Aug13/WorkingDirectory/diamondDBSpecies5
b'diamond v2.0.11.149 (C) Max Planck Society for the Advancement of Science\nDocumentation, support and updates available at http://www.diamondsearch.org\nPlease cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)\n\n#CPU threads: 144\nScoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)\nDatabase input file: /gpfs/scratch/withomas/primary_transcripts/OrthoFinder/Results_Aug13/WorkingDirectory/Species5.fa\nOpening the database file... [0.001s]\nLoading sequences... [0.003s]\nError: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed.\n
This error seems to come from a handful of proteins in the pruned ENSEMBL proteomes, that have exclusively amino acids Thr-Ala-Cys-Gly, that are being taken as DNA (ATCG). An example in the human proteome left behind by primary transcript.py
>ENSG00000282431.1 GTGG
It seems like this error is occurring due to an update in diamond=2.0.11 (downloaded with Orthofinder v 2.5.4)
- Added error message when reading protein sequences from FASTA files that only contain DNA letters (can be disabled using
--ignore-warnings)
I was not able to disable this error in my Orthofinder workflow, and I could just prune each transcript file to remove these problematic sequences. I instead just reverted my diamond back to 2.0.9 in my environment.
conda install diamond=2.0.9
Just figured I would post in case anyone else runs into this issue in the future! Thanks for all you do with this program, it is great!
Regards, Bill