apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
168 stars 17 forks source link

mmseqs prefilter error: database has wrong type #34

Closed ShailNair closed 9 months ago

ShailNair commented 9 months ago

Hi, I am trying to annotate virus contigs ( 5kb and above) identified via virsorter2 and deepvirfinder. However the mmseqs prefilter throws the following error:

[14:07:34] Executing genomad annotate.
[14:07:34] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option to force the execution of all the steps again.
[14:07:34] final.vcontigs.fixed_proteins.faa was found. Skipping gene prediction with prodigal-gv.
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/mmseqs2.py", line 190, in run_mmseqs2
    subprocess.run(command, stdout=fout, stderr=fout, check=True)
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'prefilter', PosixPath('0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/query_db/query_db'), PosixPath('/home/user/database/genomad-1.5/genomad_db'), PosixPath('0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/search_db/prefilter_db'), '--threads', '30', '-s', '4.2', '--split', '0', '--split-mode', '0', '--max-seqs', '10000000', '--min-ungapped-score', '25', '-k', '5']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
    sys.exit(cli())
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/rich_click/rich_group.py", line 21, in main
    rv = super().main(*args, standalone_mode=False, **kwargs)
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/cli.py", line 441, in annotate
    genomad.annotate.main(
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/modules/annotate.py", line 203, in main
    mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
  File "/home/user/miniconda3/envs/genomad/lib/python3.8/site-packages/genomad/mmseqs2.py", line 193, in run_mmseqs2
    raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs prefilter 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/query_db/query_db /home/user/database/genomad-1.5/genomad_db 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/search_db/prefilter_db --threads 30 -s 4.2 --split 0 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5' failed.

I checked the mmseqs2.log and it says Input database has the wrong type (Generic):

Time for merging to query_db: 0h 0m 0s 8ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 124ms
prefilter 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/query_db/query_db /home/user/database/genomad-1.5/genomad_db 0.6.viral_taxo/0.2.genomad/final.vcontigs.fixed_annotate/final.vcontigs.fixed_mmseqs2/search_db/prefilter_db --threads 30 -s 4.2 --split 0 --split-mode 0 --max-seqs 10000000 --min-ungapped-score 25 -k 5 

MMseqs Version:             14.7e284
Substitution matrix         aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix    aa:VTML80.out,nucl:nucleotide.out
Sensitivity                 4.2
k-mer length                5
k-score                     seq:2147483647,prof:2147483647
Alphabet size               aa:21,nucl:5
Max sequence length         65535
Max results per query       10000000
Split database              0
Split mode                  0
Split memory limit          0
Coverage threshold          0
Coverage mode               0
Compositional bias          1
Compositional bias          1
Diagonal scoring            true
Exact k-mer matching        0
Mask residues               1
Mask residues probability   0.9
Mask lower case residues    0
Minimum diagonal score      25
Selected taxa               
Include identical seq. id.  false
Spaced k-mers               1
Preload mode                0
Pseudo count a              substitution:1.100,context:1.400
Pseudo count b              substitution:4.100,context:5.800
Spaced k-mer pattern        
Local temporary path        
Threads                     30
Compressed                  0
Verbosity                   3

Input database "/home/user/database/genomad-1.5/genomad_db" has the wrong type (Generic).

Allowed input:
- Index
- Nucleotide
- Profile
- Aminoacid

I tried by re-downloading the database, and changing the output directory but had the same error. The database files were manually downloaded and extracted to /home/user/database/genomad-1.5 Environment info

genomad --version
geNomad, version 1.7.0  (installed through conda)

 mmseqs version
14.7e284

database =1.5

ls /home/user/database/genomad-1.5
genomad_db
genomad_hmm_v1.5  
genomad_metadata_v1.5.tsv  
genomad_msa_v1.5  
mmseqs_vrefseq  
version.txt
ShailNair commented 9 months ago

my bad. the database should be /home/user/database/genomad-1.5/genomad_db.

apcamargo commented 9 months ago

No worries! Let me know if you have any other questions