Closed Wanli-HE closed 3 years ago
also with "-m diamond"
command line:
/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/bin/diamond blastx -d /home/pro jects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/eggnog_proteins.dmnd -q /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/li near_non_redundant_gene/linear_non_redundant_genes.fa --threads 35 -o /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_no n_redundant_gene/emappertmp_dmdn_i5tdga_a/528f134e919d4ee9b9b0bd7f9950e557 --sensitive -e 0.001 --max-target-seqs 0 --max-hsps 0 --outfmt 6
result: diamond v2.0.4.142 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: No such file or directory Error: Error opening temporary file /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/emappertmp_dmdn_i5tdga_a/diamond-tmp-h2Ox1T
Hi @Wanli-HE ,
you should send the output to a directory which actually exists. I am not sure whether those "emappertmp_" directories exist. They are usually created by emapper and removed afterwards. Although it is true that when emapper crashes sometimes they remain in place.
What error do you get when running diamond and/or mmseqs from emapper.py?
Best, Carlos
Hi @Wanli-HE ,
you should send the output to a directory which actually exists. I am not sure whether those "emappertmp_" directories exist. They are usually created by emapper and removed afterwards. Although it is true that when emapper crashes sometimes they remain in place.
What error do you get when running diamond and/or mmseqs from emapper.py?
Best, Carlos
Hi! Thanks for your answering,
here is the error
ESC[1;33m /home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/bin/diamond blastx -d /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/eggnog_proteins.dmnd -q /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/linear_non_redundant_genes.fa --threads 35 -o /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/emappertmp_dmdn_0qin8h8k/adbd0fc5432342659f1d09d8ada3197f --sensitive -e 0.001 --max-target-seqs 0 --max-hsps 0 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhspESC[0m Error running diamond: Computing alignments...
Deallocating buffers... [0.435s] Clearing query masking... [0.47s] Opening temporary output file... [0.006s] Computing alignments... /var/spool/torque/mom_priv/jobs/30997971.SC: line 33: 10280 Killed diamond blastp -d /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/eggnog_proteins.dmnd -q linear_gene_prodigal_protein_seq.faa --threads 35 -o diamondres --sensitive -e 0.001 --max-target-seqs 0 --max-hsps 0 --outfmt 6 --no-unlink
here is diamond running problem
Sincerely I have no idea what is going on. Could be some memory limit you have in your computer/nodes?
You could try running:
/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/bin/diamond blastx -d /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/eggnog_proteins.dmnd -q /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/linear_non_redundant_genes.fa --threads 35 -o /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/test_diamond_out_dir --sensitive -e 0.001 --max-target-seqs 0 --max-hsps 0 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhsp
and report what happens.
Sincerely I have no idea what is going on. Could be some memory limit you have in your computer/nodes?
You could try running:
/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/bin/diamond blastx -d /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/eggnog_proteins.dmnd -q /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/linear_non_redundant_genes.fa --threads 35 -o /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/test_diamond_out_dir --sensitive -e 0.001 --max-target-seqs 0 --max-hsps 0 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhsp
and report what happens.
hi! is memory problem! i had solved it.
thanks!
by the way, did you any idea about, normally, how long it will take for diamond annotation for a file 250Mb nucl sequence!
Hi @Wanli-HE ,
glad that you solved it!
I see that you are using --itype metagenome with diamond. As you can read here: https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.0.2-v2.0.8#Gene_Prediction_Options I would not recommend using diamond blastx for large assembled contigs or for genomes, since it could take a long time to complete.
A maybe better, likely much faster, approach would be using -m diamond --itype metagenome --genepred prodigal
. Like this, diamond would perform the search using as queries the proteins predicted by prodigal.
Another approach would be using MMseqs2 instead of diamond when using --itype metagenome.
Diamond blastx could be good to search CDS on small contigs (expected to bear a single CDS, for example), out of frame CDS, or a few contigs for which you wish to confirm the CDS detected by prodigal, for instance.
I hope this makes sense.
Best, Carlos
--itype metagenome.
ok! thanks! i will try to do that!
hi Carlos!
when i using mmseqs to annotation genes, the command line like blow:
emapper.py -m mmseqs -i linear_non_redundant_genes.part-001.fa --itype CDS --translate -o genes.part-001 --cpu 35 --data_dir /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2 --mmseqs_db /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/mmseqs --temp_dir .
but it raising an error.
OSError: [Errno 39] Directory not empty: '/home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_6_zhui6x'
what was happend?
Hi @Wanli-HE ,
could you paste the whole output from emapper, please? To try to understand in which step the error is produced.
Thank you.
Best, Carlos
Hi @Wanli-HE ,
could you paste the whole output from emapper, please? To try to understand in which step the error is produced.
Thank you.
Best, Carlos
here is the output:
Working directory is /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96
ESC[1;33m /home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/bin/mmseqs createdb /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y/tmpish70mtq /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y/24b1df109a5f4d28b0cae926745f1559 --dbtype 1ESC[0m
ESC[1;33m /home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/bin/mmseqs search -a true /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y/24b1df109a5f4d28b0cae926745f1559 /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/mmseqs /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y/d67f1a13dc4f419cb1eb66e75cc22616 /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y --start-sens 3 --sens-steps 3 -s 7 --threads 35ESC[0m
here is the error:
Traceback (most recent call last): File "/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/search/mmseqs/mmseqs.py", line 207, in search_step completed_process = subprocess.run(cmd, capture_output=True, check=True, shell=True) File "/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/bin/mmseqs search -a true /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y/24b1df109a5f4d28b0cae926745f1559 /home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/mmseqs /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y/d67f1a13dc4f419cb1eb66e75cc22616 /home/projects/ku_00041/archive/gut_sample_result/bacteri-gene/linear_non_redundant_gene/split-96/emappertmp_mmseqs_2kng9e8y --start-sens 3 --sens-steps 3 -s 7 --threads 35' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/search/mmseqs/mmseqs.py", line 145, in _search raise e File "/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/search/mmseqs/mmseqs.py", line 140, in _search alignmentsdb, cmds = self.run_mmseqs(in_file, tempdir, querydb, self.targetdb, resultdb, bestresultdb) File "/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/search/mmseqs/mmseqs.py", line 163, in run_mmseqs cmd = self.search_step(querydb, targetdb, resultdb, tempdir) File "/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/site-packages/eggnogmapper/search/mmseqs/mmseqs.py", line 209, in search_step raise EmapperException("Error running 'mmseqs search': "+cpe.stderr.decode("utf-8").strip().split("\n")[-1]) eggnogmapper.emapperException.EmapperException: Error running 'mmseqs search': Current input: Generic. Allowed input: Index, Nucleotide, Pro file, Aminoacid
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/projects/ku_00041/data/test-pplaa/Plaspline/conda_envs/759e9edb/bin/emapper.py", line 664, in
i searched in web, it maybe the problem with "rm -rf " command
and also one problem, i split my cds genes.fa file into small part, origin is about 6Gb, and split 400 sub-file, and each is 16M. and i try one using diamond blasp, but it still need long time, 35 cpu, and running time over 200 cpu hours. still not finished. so what is in behind of this command, is this normal?
Hi @Wanli-HE ,
regarding the MMseqs2 error, it is showing this:
raise EmapperException("Error running 'mmseqs search': "+cpe.stderr.decode("utf-8").strip().split("\n")[-1])
eggnogmapper.emapperException.EmapperException: Error running 'mmseqs search': Current input: Generic. Allowed input: Index, Nucleotide, Pro
file, Aminoacid
It is detecting the input as "Generic". One reason could be that your fasta files were not correctly formatted? Or it could be that there is some bug affecting your input when translating by emapper, etc. Please, check that your files are correct so that we can discard that.
Regarding the timings using diamond blastp, how many sequences do you have in each sub-file?
Closing this issue. Feel free to re-open or re-issue.
Best, Carlos
hi!
i am using new version of eggong-mapper. -m mmseq, it raises an error, so i run mmseq separately, the command like this:
/home/projects/ku_00041/data/testpplaa/Plaspline/conda_envs/759e9edb/lib/python3.9/sitepackages/eggnogmapper/bin/mmseqs search -a true
/home/projects/ku_00041/archive/gut_sample_result/circular_non_readundant_gene0.55/emappertmp_mmseqs_wb_b19ph/51e406339f9b4c65a36c2cc8f64af1cd
/home/projects/ku_00041/data/test-pplaa/Plaspline/db/EggNOGV2/mmseqs/mmseqs.db
/home/projects/ku_00041/archive/gut_sample_result/circular_non_readundant_gene-0.55/emappertmp_mmseqswb b19ph/5b11e19723d04baa9a2576192978d7dc
/home/projects/ku_00041/archive/gut_sample_result/circular_non_readundant_gene-0.55/emappertmp_mmseqs_wb_b19ph --start-sens 3 --sens-steps 3 -s 7 --threads 35
and i get the error:
Input /home/projects/ku_00041/archive/gut_sample_result/circular_non_readundant_gene-0.55/emappertmp_mmseqs_wb_b19ph/51e406339f9b4c65a36c2cc8f64af1cd does not exist.
i think it is because of the temp file in behind caused this problem.
is that true? and how to solve it