WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

Issue with DRAM-v distill - error with "auxilary score" #91

Closed aprabhu90 closed 2 years ago

aprabhu90 commented 3 years ago

Hi,

I ran the latest dramv on putative vMAGs obtained from both virsorter2 and vibrant, followed by dereplication at 0.99 with CD-HIT to get putative vMAGs. I was hoping to include vMAGs identified by Vibrant as well through this pipeline.

The latest dram(dramv_1.2.3) ran smoothly on putative vMAGs without a hitch but I seem to have issues with running distill. I tried on a small subset of samples with a single annotation file DRAM-v.py distill -i annotations.tsv -o Dramv_test and obtained the following error message -

0:00:00.031097: Retrieved database locations and descriptions
Traceback (most recent call last):
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'auxiliary_score'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/sw/miniconda3/envs/dram_1.2.3/bin/DRAM-v.py", line 140, in <module>
    args.func(**args_dict)
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/mag_annotator/summarize_vgfs.py", line 235, in summarize_vgfs
    potential_amgs = filter_to_amgs(annotations, max_aux=max_auxiliary_score,
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/mag_annotator/summarize_vgfs.py", line 49, in filter_to_amgs
    vmap_aux_check = ('V' not in amg_flags) and ('M' in amg_flags) and (row['auxiliary_score'] <= max_aux) and \
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/series.py", line 853, in __getitem__
    return self._get_value(key)
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/series.py", line 961, in _get_value
    loc = self.index.get_loc(label)
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'auxiliary_score'

Could this be a database issue as I can't seem to find anything close to "auxiliary score" column in the annotate output ? Have pasted below the annotation output if it helps.

fasta   scaffold    gene_position   start_position  end_position    strandedness    rank    kegg_id kegg_hit    viral_id    viral_hit   viral_RBH   viral_identity  viral_bitScore  viral_eVal  pfam_hits   cazy_hits   vogdb_description   vogdb_categories    heme_regulatory_motif_count vogdb_hits  peptidase_id    peptidase_family    peptidase_hit   peptidase_RBH   peptidase_identity  peptidase_bitScore  peptidase_eVal  is_transposon   amg_flags
NODE_1184_length_7202_cov_5.457825_1    NODE_1184_length_7202_cov_5.457825  NODE_1184_length_7202_cov_5.457825  1   2   343 1   D           YP_008320277.1  YP_008320277.1 endonuclease [Puniceispirillum phage HMO-2011]   False   0.8 188.0   4.164e-55   Phage endonuclease I [PF05367.11]       sp|P00641|ENDO_BPT7 Endonuclease I; Xp  Xp  0                                   False   F
NODE_1184_length_7202_cov_5.457825_2    NODE_1184_length_7202_cov_5.457825  NODE_1184_length_7202_cov_5.457825  2   347 727 1   E           YP_008320278.1  YP_008320278.1 hypothetical protein phage1322_16 [Puniceispirillum phage HMO-2011]  False   0.697   164.0   9.599e-47                   0                                   False   F
NODE_1184_length_7202_cov_5.457825_3    NODE_1184_length_7202_cov_5.457825  NODE_1184_length_7202_cov_5.457825  3   724 1035    1   D           YP_008320279.1  YP_008320279.1 hypothetical protein phage1322_17 [Puniceispirillum phage HMO-2011]  False   0.921   143.0   8.5e-40 Protein of unknwon function (DUF3310) [PF11753.8]       sp|P07719|V17_BPT3 Gene 1.7 protein; Xh Xh  0                                   False   F
NODE_1184_length_7202_cov_5.457825_4    NODE_1184_length_7202_cov_5.457825  NODE_1184_length_7202_cov_5.457825  4   1025    1183    1   E                                                   0                                   False   F

Could you advise what is the alternate option for vMAGs obtained this way? Thanks, Apoorva

shafferm commented 3 years ago

Hi Apoorva,

To get auxiliary scores you need to include the VIRSorter affi_contigs file. Without it we can't calculate the auxiliary score which we use to measure how confident a gene on a viral contig is actually viral and not a host gene erroneously included on the contig.

Mike