WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
250 stars 52 forks source link

DRAM fails in KEGG step, without KEGG database #1

Closed erikrikarddaniel closed 4 years ago

erikrikarddaniel commented 4 years ago

I am using a Conda/GitHub installed version of DRAM; unclear which version and downloaded the databases with this command, i.e. skipping KEGG:

DRAM.py prepare_databases --output_dir DRAM_data/ --threads 16

DRAM.py print_config gives the following:

KEGG db location: None KOfam db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/kofam_profiles.hmm KOfam KO list location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/kofam_ko_list.tsv UniRef db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/uniref90.20200215.mmsdb Pfam db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/pfam.mmspro dbCAN db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/dbCAN-HMMdb-V7.txt RefSeq Viral db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/refseq_viral.20200215.mmsdb MEROPS peptidase db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/peptidases.20200215.mmsdb VOGDB db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/vog_latest_hmms.txt Description db location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/description_db.sqlite Genome summary form location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/genome_summary_form.20200215.tsv ETC module database location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/etc_mdoule_database.20200215.tsv Function heatmap form location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/function_heatmap_form.20200215.tsv AMG database location: /crex/proj/sllstore2017037/nobackup/data/DRAM_data/amg_database.20200215.tsv

When I run DRAM.py annotate -i 'MAGs/*.fna' -o annotation --threads 16, I get the following error, suggesting that it tries KEGG annotation in the absence of a database:

2020-02-16 09:56:20.361333: Annotation started 0:00:00.016704: 293 fastas found 0:00:00.281481: Retrieved database locations and descriptions 0:00:00.281504: Annotating OX3.63.fa.edit 0:00:00.283347: Filtering fasta 0:00:00.876378: Calling genes with prodigal 0:01:02.438230: Turning genes from prodigal to mmseqs2 db 0:03:29.291201: Getting forward best hits from kegg Traceback (most recent call last): File "/home/daniel/miniconda3/envs/dram/bin/DRAM.py", line 7, in exec(compile(f.read(), file, 'exec')) File "/domus/h1/daniel/dev/DRAM/scripts/DRAM.py", line 159, in args.func(*args_dict) File "/domus/h1/daniel/dev/DRAM/mag_annotator/annotate_bins.py", line 826, in annotate_bins verbose)) File "/domus/h1/daniel/dev/DRAM/mag_annotator/annotate_bins.py", line 675, in annotate_fasta verbose)) File "/domus/h1/daniel/dev/DRAM/mag_annotator/annotate_bins.py", line 612, in do_blast_style_search threads, verbose=verbose) File "/domus/h1/daniel/dev/DRAM/mag_annotator/annotate_bins.py", line 68, in get_best_hits verbose=verbose) File "/domus/h1/daniel/dev/DRAM/mag_annotator/utils.py", line 34, in run_process stderr=stderr).stdout.decode(errors='ignore') File "/home/daniel/miniconda3/envs/dram/lib/python3.7/subprocess.py", line 488, in run with Popen(popenargs, **kwargs) as process: File "/home/daniel/miniconda3/envs/dram/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/home/daniel/miniconda3/envs/dram/lib/python3.7/subprocess.py", line 1482, in _execute_child restore_signals, start_new_session, preexec_fn) TypeError: expected str, bytes or os.PathLike object, not NoneType

I have found no documented way of disabling the KEGG annotation.

BTW, I believe the documentation for downloading the databases has an error as the command with and without KEGG is the same.

shafferm commented 4 years ago

Thanks for reporting this bug. I have found the source and have pushed a fix to the master branch. You should be able to reinstall DRAM and it should work (hopefully).

In order to not have to download and install databases again you can use the set_database_locations command to set where the processed databases are already. Alternatively you can copy the CONFIG file from the mag_annotator directory in your installation and save it somewhere else. Then when you reinstall you can replace the new empty CONFIG file with your working one. Hopefully that makes sense. I'm planning on adding an import/export CONFIG function in the future.

Also thanks for the heads up about the documentation. This will be fixed today!

Mike

erikrikarddaniel commented 4 years ago

Den ons 19 feb. 2020 kl 19:24 skrev Michael Shaffer notifications@github.com:

Thanks for reporting this bug. I have found the source and have pushed a fix to the master branch. You should be able to reinstall DRAM and it should work (hopefully).

Thanks. With "reinstall" you mean pull and do "pip install -e"?

In order to not have to download and install databases again you can use the set_database_locations command to set where the processed databases are already. Alternatively you can copy the CONFIG file from the mag_annotator directory in your installation and save it somewhere else. Then when you reinstall you can replace the new empty CONFIG file with your working one. Hopefully that makes sense. I'm planning on adding an import/export CONFIG function in the future.

OK. Completely comprehensible, hopefully works too! ;-)

Also thanks for the heads up about the documentation. This will be fixed today!

Mike

/D

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

shafferm commented 4 years ago

Yes, pull a new version and then pip install again.