Open ucassee opened 3 years ago
There is nothing wrong with your result. This is to let you know that there will be some rows in your amg_summary.tsv
file which have no metabolic information. This is because the gene in question has a database identifier assigned that is a known AMG but is not in our distillate. We are working on eliminating these by adding all of these genes to the distillate. So not a problem. Just a gene that you won't get quite as much metabolic information about.
Just to add some detail -- though I expect you're on top of it -- I am seeing this for genes that are all coming from PFAM. I can see looking at the DRAM database that the summary tables carries no PFxxxx ids. I guess this is a to-do or do you feel that these are not important, too speculative a function or false positives amgs?
Note below I am using pdb and upscaling warnings to errors to inspect the runtime state.
Eg. For the following row from potential_amgs
from the method def make_viral_distillate(potential_amgs, genome_summary_frame)
fasta edge_662__full_1-cat_1
scaffold edge_662__full_1-cat_1
gene_position 22
start_position 12819
end_position 13820
strandedness -1
rank D
kegg_id
kegg_hit
viral_id YP_009124812.1
viral_hit YP_009124812.1 hydrolase [Mycobacterium phage ...
viral_RBH False
viral_identity 0.274
viral_bitScore 94.0
viral_eVal 0.0
pfam_hits alpha/beta hydrolase fold [PF07859.15]; Prolyl...
cazy_hits
vogdb_description sp|I6Y9F7|LIPQ_MYCTU Esterase LipQ; Xu
vogdb_categories Xu
heme_regulatory_motif_count 0
virsorter_category 1.0
auxiliary_score 1
is_transposon False
amg_flags MK
peptidase_id MER0155040
peptidase_family S09X
peptidase_hit MER0155040 - family S9 unassigned peptidases (...
peptidase_RBH True
peptidase_identity 0.97
peptidase_bitScore 478.0
peptidase_eVal 0.0
Name: edge_662__full_1-cat_1_22, dtype: object
get_ids_from_row(row)
returns the set
{'', 'PF08840', 'PF02129', 'PF01738', 'PF00326', 'PF05448', 'PF12146', 'PF00135', 'PF10340', 'PF12740', 'S09X', 'PF07859', 'PF02230'}
Which produces an empty set when intersected with set(genome_summary_frame.index)
and hence the "No distillate" warning.
Looking at the detail in the row, it does seem like an interesting gene, whereas I'd sort of expected that it would have been a DUF.
You are correct that these will all be from PFAM. They are genes which have been previously recognized as AMGs in other studies but either don't fit cleanly into the distillate categories that we have currently defined or we do not feel comfortable calling them metabolic genes based on only one domain. It's definitely something on our to-do list to fix in the future. We want all the functions from these genes to make it into the distillate in some form.
Hi, There are too many warnings like the following when I run
DRAM-v.py distill -i annotation/annotations.tsv -o annotation/distilled.
Is there something wrong with my result?
Thanks,