davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
709 stars 189 forks source link

Using DNA as input sequences #508

Open adr14 opened 3 years ago

adr14 commented 3 years ago

Hi there,

I have installed Orthofinder v2.5.2 via conda. I can run it using protein sequences as input but I have problems when I try to use CDS sequences. When I ran the command using "-d -S mmseqs" it failes with error

ERROR: An error occurred ERROR: diamond makedb failed

I tried to use a local installation of mmseqs (and renamed both diamond and mmseqs within orthofinder) but I get the same error. What am I doing wrong? Many thanks for tyour help.

Adriana

adr14 commented 3 years ago

I have actually found more information on the log file of the job submission and it looks as if the problem is creating the database. I have 3 species and orthofinder failes after creating the first database.

I have tested each fasta file indipendently and they all work (I can get mmseqs to create the database as long as they are Species0.fa).

Do you know what might cause this error?

Below is the full log file which also shows a requirement for setting up mmseqs search parameters (i.e. --search-type). How do you incorporate mmseqs specific parameters within the orthofinder command line?

Thanks

(P.S. I have no problem in running the program using aa sequences or running orthofinder using precomputed mmseqs searches)

OrthoFinder version 2.5.2 Copyright (C) 2014 David Emms

2021-02-18 18:38:08 : Starting OrthoFinder 2.5.2 64 thread(s) for highly parallel tasks (BLAST searches etc.) 8 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/SimpleTest.phy -o /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/SimpleTest.tre" - ok

WARNING: Files have been ignored as they don't appear to be FASTA files: list-SC-orthologues.txt list-aln.txt OrthoFinder expects FASTA files to have one of the following extensions: fas, pep, fasta, faa, fa

Dividing up work for BLAST for parallel processing

2021-02-18 18:38:15 : Creating mmseqs database 1 of 3

ERROR: external program called by OrthoFinder returned an error code: 1

Command: mmseqs createdb /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/Species0.fa /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/mmseqsDBSpecies0.fa ; mmseqs createindex /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/mmseqsDBSpecies0.fa /tmp

stdout

b'createdb /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/Species0.fa /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/mmseqsDBSpecies0.fa \n\nMMseqs Version: \t12.113e3\nDatabase type \t0\nShuffle input database\ttrue\nCreatedb mode \t0\nWrite lookup file \t1\nOffset of numeric ids \t0\nCompressed \t0\nVerbosity \t3\n\nConverting sequences\n[=====\nTime for merging to mmseqsDBSpecies0.fa_h: 0h 0m 0s 52ms\nTime for merging to mmseqsDBSpecies0.fa: 0h 0m 0s 204ms\nDatabase type: Nucleotide\nTime for processing: 0h 0m 0s 771ms\ncreateindex /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/mmseqsDBSpecies0.fa /tmp \n\nMMseqs Version: \t12.113e3\nSeed substitution matrix \tnucl:nucleotide.out,aa:VTML80.out\nk-mer length \t0\nAlphabet size \tnucl:5,aa:21\nCompositional bias \t1\nMax sequence length \t65535\nMax results per query \t300\nMask residues \t1\nMask lower case residues \t0\nSpaced k-mers \t1\nSpaced k-mer pattern \t\nSensitivity \t7.5\nk-score \t0\nCheck compatible \t0\nSearch type \t0\nSplit database \t0\nSplit memory limit \t0\nVerbosity \t3\nThreads \t64\nMin codons in orf \t30\nMax codons in length \t32734\nMax orf gaps \t2147483647\nContig start mode \t2\nContig end mode \t2\nOrf start mode \t1\nForward frames \t1,2,3\nReverse frames \t1,2,3\nTranslation table \t1\nTranslate orf \t0\nUse all table starts \tfalse\nOffset of numeric ids \t0\nCreate lookup \t0\nCompressed \t0\nAdd orf stop \tfalse\nOverlap between sequences\t0\nSequence split mode \t1\nStrand selection \t1\nRemove temporary files \tfalse\n\ncreateindex /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/mmseqsDBSpecies0.fa /tmp \n\nMMseqs Version: \t12.113e3\nSeed substitution matrix \tnucl:nucleotide.out,aa:VTML80.out\nk-mer length \t0\nAlphabet size \tnucl:5,aa:21\nCompositional bias \t1\nMax sequence length \t65535\nMax results per query \t300\nMask residues \t1\nMask lower case residues \t0\nSpaced k-mers \t1\nSpaced k-mer pattern \t\nSensitivity \t7.5\nk-score \t0\nCheck compatible \t0\nSearch type \t0\nSplit database \t0\nSplit memory limit \t0\nVerbosity \t3\nThreads \t64\nMin codons in orf \t30\nMax codons in length \t32734\nMax orf gaps \t2147483647\nContig start mode \t2\nContig end mode \t2\nOrf start mode \t1\nForward frames \t1,2,3\nReverse frames \t1,2,3\nTranslation table \t1\nTranslate orf \t0\nUse all table starts \tfalse\nOffset of numeric ids \t0\nCreate lookup \t0\nCompressed \t0\nAdd orf stop \tfalse\nOverlap between sequences\t0\nSequence split mode \t1\nStrand selection \t1\nRemove temporary files \tfalse\n\nDatabase /ibers/ernie/scratch/adr/orthofinder/cds-fasta/OrthoFinder/Results_Feb18_2/WorkingDirectory/mmseqsDBSpecies0.fa is a nucleotide database. \nPlease provide the parameter --search-type 2 (translated) or 3 (nucleotide)\nTime for processing: 0h 0m 0s 1ms\n' stderr

b'' ERROR: diamond makedb failed ERROR: An error occurred, please review the error messages they may contain useful information about the problem. orthofinder.sge.o468967.docx

davidemms commented 3 years ago

Hi Adriana

The cds functionality was designed to use BLAST, if you removed the "-S mmseqs" it should work. I will look into adding the ability to use MMSeqs for this too.

All the best David

adr14 commented 3 years ago

Hi David,

That would explain it! I got around the problem by using results of mmseqs seaches run outside orthofinder. A bit of faffing around but still quicker than running blasts. It would be good if mmseqs could be implemented to be uses with DNA sequences. Many thanks Adriana

davidemms commented 3 years ago

Hi Adriana

Thanks for posting this. I'm going to reopen this issue so I don't forget to make the changes for MMSeqs.

Best wishes David