Open WenluYIN opened 7 months ago
createdb genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_proteins.faa genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_mmseqs2/query_db/query_db
MMseqs Version: 13-45111+ds-2 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3
Converting sequences [ Time for merging to query_db_h: 0h 0m 0s 1ms Time for merging to query_db: 0h 0m 0s 2ms Database type: Aminoacid Time for processing: 0h 0m 0s 46ms prefilter genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_mmseqs2/search_db/prefilter_db --threads 8 -s 4.2 --split 8 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5
MMseqs Version: 13-45111+ds-2
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 4.2
k-mer length 5
k-score 2147483647
Alphabet size nucl:5,aa:21
Max sequence length 65535
Max results per query 10000000
Split database 8
Split mode 0
Split memory limit 0
Coverage threshold 0
Coverage mode 0
Compositional bias 1
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 25
Include identical seq. id. false
Spaced k-mers 1
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Spaced k-mer pattern
Local temporary path
Threads 8
Compressed 0
Verbosity 3
Query database size: 2183 type: Aminoacid Target split mode. Searching through 8 splits Estimated memory consumption: 588M Target database size: 227897 type: Profile Process prefiltering step 1 of 8
Index table k-mer threshold: 104 at k-mer size 5 Index table: counting k-mers [=================================================================] 28.46K 0s 21ms Index table: Masked residues: 0 No k-mer could be extracted for the database genomad_db/genomad_db. Maybe the sequences length is less than 14 residues.
This is a problem caused by an incompatibility between your version of MMseqs2 and the version required by the database. Can you update your MMseqs2? This should solve the problem.
Hello I'm running into the same problem as above. I've updated MMseqs2 (version 13.45111) and also tried additional splits but both unfortunately did not solve the issue. Do you have any recommendations for other solutions?
MMseqs2 version 13.45111 is very old and incompatible with the current geNomad database. Can you update to version 15.6f452? How did you install geNomad/MMseqs2?
genomad end-to-end --cleanup --splits 8 '/home/wenluyin/GCF_009025895.1/ncbi_dataset/data/GCF_009025895.1/GCF_009025895.1_ASM902589v1_genomic.fna' genomad_output genomad_db ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad annotate (v1.7.1). This will perform gene calling in the input sequences and annotate the predicted proteins with geNomad's markers. │ │ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate │ │ ├── GCF_009025895.1_ASM902589v1_genomic_annotate.json (execution parameters) │ │ ├── GCF_009025895.1_ASM902589v1_genomic_genes.tsv (gene annotation data) │ │ ├── GCF_009025895.1_ASM902589v1_genomic_taxonomy.tsv (taxonomic assignment) │ │ ├── GCF_009025895.1_ASM902589v1_genomic_mmseqs2.tsv (MMseqs2 output file) │ │ └── GCF_009025895.1_ASM902589v1_genomic_proteins.faa (protein FASTA file) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [16:38:35] Executing genomad annotate.
[16:38:35] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option to force the execution of all the steps again.
[16:38:35] GCF_009025895.1_ASM902589v1_genomic_proteins.faa was found. Skipping gene prediction with pyrodigal-gv.
Traceback (most recent call last): File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/mmseqs2.py", line 190, in run_mmseqs2 subprocess.run(command, stdout=fout, stderr=fout, check=True) File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['mmseqs', 'prefilter', PosixPath('genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/query_db/query_db'), PosixPath('genomad_db/genomad_db'), PosixPath('genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/search_db/prefilter_db'), '--threads', '8', '-s', '4.2', '--split', '8', '--split-mode', '0', '--max-seqs', '10000000', '--min-ungapped-score', '25', '-k', '5']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/wenluyin/.local/bin/genomad", line 8, in
sys.exit(cli())
File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in call
return self.main(args, kwargs)
File "/home/wenluyin/.local/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main
rv = self.invoke(ctx)
File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
return __callback(args, kwargs)
File "/usr/lib/python3/dist-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, *kwargs)
File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/cli.py", line 1240, in end_to_end
ctx.invoke(
File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
return __callback(args, kwargs)
File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/cli.py", line 441, in annotate
genomad.annotate.main(
File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/modules/annotate.py", line 203, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/mmseqs2.py", line 193, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs prefilter genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/search_db/prefilter_db --threads 8 -s 4.2 --split 8 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5' failed.
Hello. I have same promble. I try to add use --splits 12 and more than 12, but it can not solve this problem.
Could you recommend other solutions?
Best, Wenlu