apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
168 stars 17 forks source link

Unable to run #51

Open WenluYIN opened 7 months ago

WenluYIN commented 7 months ago

genomad end-to-end --cleanup --splits 8 '/home/wenluyin/GCF_009025895.1/ncbi_dataset/data/GCF_009025895.1/GCF_009025895.1_ASM902589v1_genomic.fna' genomad_output genomad_db ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad annotate (v1.7.1). This will perform gene calling in the input sequences and annotate the predicted proteins with geNomad's markers. │ │ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate │ │ ├── GCF_009025895.1_ASM902589v1_genomic_annotate.json (execution parameters) │ │ ├── GCF_009025895.1_ASM902589v1_genomic_genes.tsv (gene annotation data) │ │ ├── GCF_009025895.1_ASM902589v1_genomic_taxonomy.tsv (taxonomic assignment) │ │ ├── GCF_009025895.1_ASM902589v1_genomic_mmseqs2.tsv (MMseqs2 output file) │ │ └── GCF_009025895.1_ASM902589v1_genomic_proteins.faa (protein FASTA file) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [16:38:35] Executing genomad annotate.
[16:38:35] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option to force the execution of all the steps again.
[16:38:35] GCF_009025895.1_ASM902589v1_genomic_proteins.faa was found. Skipping gene prediction with pyrodigal-gv.
Traceback (most recent call last): File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/mmseqs2.py", line 190, in run_mmseqs2 subprocess.run(command, stdout=fout, stderr=fout, check=True) File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['mmseqs', 'prefilter', PosixPath('genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/query_db/query_db'), PosixPath('genomad_db/genomad_db'), PosixPath('genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/search_db/prefilter_db'), '--threads', '8', '-s', '4.2', '--split', '8', '--split-mode', '0', '--max-seqs', '10000000', '--min-ungapped-score', '25', '-k', '5']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/wenluyin/.local/bin/genomad", line 8, in sys.exit(cli()) File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/home/wenluyin/.local/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main rv = self.invoke(ctx) File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "/usr/lib/python3/dist-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, *kwargs) File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/cli.py", line 1240, in end_to_end ctx.invoke( File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/cli.py", line 441, in annotate genomad.annotate.main( File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/modules/annotate.py", line 203, in main mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits) File "/home/wenluyin/.local/lib/python3.10/site-packages/genomad/mmseqs2.py", line 193, in run_mmseqs2 raise Exception(f"'{command_str}' failed.") from e Exception: 'mmseqs prefilter genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/GCF_009025895.1_ASM902589v1_genomic_annotate/GCF_009025895.1_ASM902589v1_genomic_mmseqs2/search_db/prefilter_db --threads 8 -s 4.2 --split 8 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5' failed.

Hello. I have same promble. I try to add use --splits 12 and more than 12, but it can not solve this problem.

Could you recommend other solutions?

Best, Wenlu

WenluYIN commented 7 months ago

createdb genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_proteins.faa genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_mmseqs2/query_db/query_db

MMseqs Version: 13-45111+ds-2 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3

Converting sequences [ Time for merging to query_db_h: 0h 0m 0s 1ms Time for merging to query_db: 0h 0m 0s 2ms Database type: Aminoacid Time for processing: 0h 0m 0s 46ms prefilter genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/GCF_000022365.1_ASM2236v1_genomic_annotate/GCF_000022365.1_ASM2236v1_genomic_mmseqs2/search_db/prefilter_db --threads 8 -s 4.2 --split 8 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5

MMseqs Version: 13-45111+ds-2 Substitution matrix nucl:nucleotide.out,aa:blosum62.out Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out Sensitivity 4.2 k-mer length 5 k-score 2147483647 Alphabet size nucl:5,aa:21 Max sequence length 65535 Max results per query 10000000 Split database 8 Split mode 0 Split memory limit 0 Coverage threshold 0 Coverage mode 0 Compositional bias 1 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask lower case residues 0 Minimum diagonal score 25 Include identical seq. id. false Spaced k-mers 1 Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Spaced k-mer pattern
Local temporary path
Threads 8 Compressed 0 Verbosity 3

Query database size: 2183 type: Aminoacid Target split mode. Searching through 8 splits Estimated memory consumption: 588M Target database size: 227897 type: Profile Process prefiltering step 1 of 8

Index table k-mer threshold: 104 at k-mer size 5 Index table: counting k-mers [=================================================================] 28.46K 0s 21ms Index table: Masked residues: 0 No k-mer could be extracted for the database genomad_db/genomad_db. Maybe the sequences length is less than 14 residues.

apcamargo commented 7 months ago

This is a problem caused by an incompatibility between your version of MMseqs2 and the version required by the database. Can you update your MMseqs2? This should solve the problem.

chrisdoering8197 commented 3 months ago

Hello I'm running into the same problem as above. I've updated MMseqs2 (version 13.45111) and also tried additional splits but both unfortunately did not solve the issue. Do you have any recommendations for other solutions?

apcamargo commented 3 months ago

MMseqs2 version 13.45111 is very old and incompatible with the current geNomad database. Can you update to version 15.6f452? How did you install geNomad/MMseqs2?