WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
247 stars 52 forks source link

KeyError: 'K16716' when using DRAM-v.py annotate #108

Closed shujieH closed 1 year ago

shujieH commented 3 years ago

Hi Michael Thank you for providing a very powerful tool. I got some errors when using DRAM-v.py annotations.

**The command was : DRAM-v.py annotate -i BH0-vs2-pass2/for-dramv/final-viral-combined-for-dramv.fa -v BH0-vs2-pass2/for-dramv/viral-affi-contigs-for-dramv.tab -o BH0-dramv-annotate --skip_trnascan --threads 60 --min_contig_size 1000

Only two output files, images and working_dir, are produced. And there is no annotations. tsv file, so the second step of summarize annotation: DRAM-v.py distill cannot be run.

The error information of the ' DRAM-v.py annotate ' command was followed:

2:19:07.966004: Annotating scaffold_16108__full_1-cat_1 2:19:08.023055: Turning genes from prodigal to mmseqs2 db 2:19:11.409979: Getting hits from kofam 2:20:11.644167: Getting forward best hits from viral 2:20:12.429295: Getting reverse best hits from viral 2:20:13.153260: Getting descriptions of hits from viral 2:20:13.161152: Getting forward best hits from peptidase 2:20:14.255133: Getting hits from pfam 2:20:27.091656: Getting hits from dbCAN 2:20:30.913326: Getting hits from VOGDB 2:21:07.002698: Merging ORF annotations 2:21:07.404375: Annotating scaffold_1842__full_1-cat_1 2:21:07.506689: Turning genes from prodigal to mmseqs2 db 2:21:09.958448: Getting hits from kofam Traceback (most recent call last): File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'K16716'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/hankang/miniconda3/envs/viral-id-sop/bin/DRAM-v.py", line 140, in args.func(**args_dict) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/mag_annotator/annotate_vgfs.py", line 406, in annotate_vgfs verbose) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/mag_annotator/annotate_bins.py", line 926, in annotate_fastas keep_tmp_dir, verbose)) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/mag_annotator/annotate_bins.py", line 825, in annotate_fasta rbh_bit_score_threshold, threads, verbose) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/mag_annotator/annotate_bins.py", line 737, in annotate_orfs threads, verbose)) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/mag_annotator/annotate_bins.py", line 242, in run_hmmscan_kofam ko_row = ko_list.loc[ko] File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/pandas/core/indexing.py", line 879, in getitem return self._getitem_axis(maybe_callable, axis=axis) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/pandas/core/indexing.py", line 1110, in _getitem_axis return self._get_label(key, axis=axis) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/pandas/core/indexing.py", line 1059, in _get_label return self.obj.xs(label, axis=axis) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/pandas/core/generic.py", line 3493, in xs loc = self.index.get_loc(key) File "/public/home/hankang/miniconda3/envs/viral-id-sop/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc raise KeyError(key) from err KeyError: 'K16716'

**When running the command : grep K16716 /public/home/hankang/HSJ/software/DRAM_db_downloaded/kofam_ko_list.tsv I can't find KO16716 in kofam_ko_list.tsv. In addition, K16716 cannot be found on the official website of the KO database (https://www.kegg.jp/kegg/ko.html ).

**I manually downloaded the required database and then installed it with the following command:

DRAM-setup.py set_database_locations --kofam_hmm_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/kofam_profiles.hmm \ --kofam_ko_list_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/ko_list.gz \ --pfam_db_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/Pfam-A.full.gz \ --pfam_hmm_dat /public/home/hankang/HSJ/software/DRAM_db_downloaded/Pfam-A.hmm.dat.gz \ --dbcan_db_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/dbCAN-HMMdb-V9.txt \ --dbcan_fam_activities /public/home/hankang/HSJ/software/DRAM_db_downloaded/CAZyDB.07302020.fam-activities.txt \ --vogdb_db_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/vog.hmm.tar.gz \ --vog_annotations /public/home/hankang/HSJ/software/DRAM_db_downloaded/vog.annotations.tsv.gz \ --viral_db_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/refseq_viral.20210303.mmsdb \ --peptidase_db_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/pepunit.lib \ --genome_summary_form_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/genome_summary_form.tsv \ --module_step_form_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/module_step_form.tsv \ --etc_module_database_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/etc_module_database.tsv \ --function_heatmap_form_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/function_heatmap_form.tsv \ --amg_database_loc /public/home/hankang/HSJ/software/DRAM_db_downloaded/amg_database.tsv \

**Then I ran this command: DRAM-setup.py update_description_db

**The database configurations are:

DRAM-setup.py print_config

KEGG db: None KOfam db: /public/home/hankang/HSJ/software/DRAM_db_downloaded/kofam_profiles.hmm KOfam KO list: /public/home/hankang/HSJ/software/DRAM_db_downloaded/kofam_ko_list.tsv UniRef db: None Pfam db: /public/home/hankang/HSJ/software/DRAM_db_downloaded/pfam.mmspro Pfam hmm dat: /public/home/hankang/HSJ/software/DRAM_db_downloaded/Pfam-A.hmm.dat.gz dbCAN db: /public/home/hankang/HSJ/software/DRAM_db_downloaded/dbCAN-HMMdb-V9.txt dbCAN family activities: /public/home/hankang/HSJ/software/DRAM_db_downloaded/CAZyDB.07302020.fam-activities.txt RefSeq Viral db: /public/home/hankang/HSJ/software/DRAM_db_downloaded/refseq_viral.20210303.mmsdb MEROPS peptidase db: /public/home/hankang/HSJ/software/DRAM_db_downloaded/peptidases.20210303.mmsdb VOGDB db: /public/home/hankang/HSJ/software/DRAM_db_downloaded/vog_latest_hmms.txt VOG annotations: /public/home/hankang/HSJ/software/DRAM_db_downloaded/vog_annotations_latest.tsv.gz Description db: /public/home/hankang/HSJ/database/None Genome summary form: /public/home/hankang/HSJ/software/DRAM_db_downloaded/genome_summary_form.tsv Module step form: /public/home/hankang/HSJ/software/DRAM_db_downloaded/module_step_form.tsv ETC module database: /public/home/hankang/HSJ/software/DRAM_db_downloaded/etc_module_database.tsv Function heatmap form: /public/home/hankang/HSJ/software/DRAM_db_downloaded/function_heatmap_form.tsv AMG database: /public/home/hankang/HSJ/software/DRAM_db_downloaded/amg_database.tsv

Can you give us some suggestions for solving this problem? Looking forward to your reply. Thanks a lot.

rmFlynn commented 3 years ago

This may be a problem best answered by @shafferm, but while he is unavailable, here are some things you could try (in order of most to least likely to help) :

  1. Also, grep /public/home/hankang/HSJ/software/DRAM_db_downloaded/kofam_profiles.hmm
  2. Try re-downloading your databases any way, being very careful to match versions.
  3. Try running with the --kofam_use_dbcan2_thresholds, it will almost definitely not help but could give a different error.
shujieH commented 3 years ago

@rmFlynn Thanks for your kind reply! As you suggested, I tried the first two methods, but there still seems to be a problem.

  1. grep K16716 in kofam_profiles.hmm

    (DRAM) [hankang@node2 DRAM_db_downloaded]$ grep K16716 kofam_profiles.hmm NAME K16716

The following is part of the content of K16716 displayed in kofam_profiles.hmm

HMMER3/f [3.2.1 | June 2018] NAME K16716 LENG 1063 ALPH amino RF no MM no CONS yes CS no MAP yes DATE Fri May 1 16:54:46 2020 NSEQ 62 EFFN 3.621460 CKSUM 3571187461 STATS LOCAL MSV -13.1545 0.69544 STATS LOCAL VITERBI -14.4792 0.69544 STATS LOCAL FORWARD -6.9857 0.69544 HMM A C D E F G H I K L M N P Q R S T V W Y m->m m->i m->d i->m i->i d->m d->d COMPO 2.64452 4.47215 2.86548 2.25869 3.78205 3.27476 3.65582 3.08370 2.34929 2.43560 3.68729 2.89302 3.67345 2.71680 2.83940 2.61903 2.87739 2.97264 5.20926 3.70060 2.68627 4.42241 2.77515 2.73132 3.46370 2.40501 3.72510 3.29330 2.67756 2.69371 4.24491 2.90348 2.73755 3.18142 2.89786 2.37898 2.77521 2.98493 4.58493 3.61519 0.15029 2.33633 3.14973 1.39776 0.28389 0.00000 * 1 4.33352 5.74318 5.53564 5.34282 4.33700 4.89216 5.83776 3.67628 5.08069 2.86562 0.22910 5.46113 5.44915 5.41609 5.12761 4.72658 4.71271 3.79344 6.19501 5.05100 14 M - - - 2.68618 4.42225 2.77519 2.73123 3.46354 2.40513 3.72494 3.29354 2.67741 2.69355 4.24690 2.90347 2.73739 3.18146 2.89801 2.37887 2.77519 2.98518 4.58477 3.61503 0.01030 4.97673 5.69908 0.61958 0.77255 0.57384 0.82864 2 2.92389 5.85153 1.78314 1.44815 5.17074 3.73657 4.13677 4.68661 3.02558 4.15077 4.93243 1.56661 4.27158 3.26685 3.58105 2.74805 3.30082 4.23458 6.28673 4.80980 15 e - - - 2.68600 4.42254 2.77518 2.73116 3.46231 2.40471 3.72425 3.29383 2.67770 2.69384 4.24719 2.90260 2.73768 3.18175 2.89830 2.37916 2.77523 2.98547 4.58506 3.61532 0.46522 0.99789 5.69908 0.68195 0.70447 0.57384 0.82864 3 2.67543 5.37679 2.55270 2.68520 4.82475 1.62025 4.02239 4.30016 2.82225 3.81369 4.57971 2.02495 3.55337 3.14328 3.32106 2.96738 2.07115 3.86260 5.97197 4.57130 21 g - - - 2.68618 4.42225 2.77519 2.73123 3.46354 2.40513 3.72494 3.29354 2.67741 2.69355 4.24690 2.90347 2.73739 3.18146 2.89801 2.37887 2.77519 2.98518 4.58477 3.61503 0.01030 4.97673 5.69908 0.61958 0.77255 0.57384 0.82864 ......

It’s weird, why is K16716 in kofam_profiles.hmm, but not in kofam_ko_list.tsv?

  1. re-downloading databases I re-downloaded the database in Zenodo (https://zenodo.org/record/4581775#.YRto_4j7SUl), But there are still KeyError: 'K16716'.

  2. Which DRAM command does kofam_use_dbcan2_thresholds run with? I didn't understand. Could you give me more details?

    Thanks again! Looking forward to your reply!

rmFlynn commented 3 years ago

The problem does look to be specific to the Zenodo data. I can confirm that that profile is missing from the kofam_ko_list.tsv, and it only exists in the kofam_profiles.hmm from Zenodo. I think that if you are able to set up the databases in the traditional way, then you will have the best chance of success. Alternatively, the best solution is probably adding an entry to the kofam_ko_list.tsv for K16716 as was the solution in this similar issue. That user had a different problem later, but this solution is probably unrelated.

Ignore the kofam_use_dbcan2_thresholds suggestion, it was a bad idea on my part.

Good luck, let us know if that works.

rmFlynn commented 1 year ago

Zenodo data is updated