WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
239 stars 50 forks source link

Selection logic for gene_id in amg_summery.tsv #350

Open diego00012138 opened 2 weeks ago

diego00012138 commented 2 weeks ago

Hi The annotation table gives a range of database comparison results for each gene, but only some of the results are retained in amg_summery.tsv, e.g. when hitting both kegg and pfam only the id of pfam may be retained in amg_summery, and in fact only two id of three pfam id keep. What kind of filtering strategy is this?

annotation.tsv:

k141_110713__full-cat_1_10 final-viral-combined-for-dramv k141_110713__full-cat_1 10 11206 12321 -1 C K01711 GDPmannose 4,6-dehydratase [EC:4.2.1.47] YP_009323158.1 YP_009323158.1 nucleotide-sugar epimerase [Synechococcus phage S-CAM7] False 0.509 323.0 6.7e-97 GDP-mannose 4,6 dehydratase [PF16363.10]; NAD dependent epimerase/dehydratase family [PF01370.26]; RmlD substrate binding domain [PF04321.22] VOG23305 sp|Q9EQC1|3BHS7_MOUSE 3 beta-hydroxysteroid dehydrogenase type 7; Xh Xh 0 1 2 False MK

amg_summery.tsv:

k141_110713full-cat_1_10 PF04321 k141_110713__full-cat_1 2 MK RmlD substrate binding domain amg_database Roux et al. 2016 False k141_110713full-cat_1_10 PF01370 k141_110713__full-cat_1 2 MK NAD dependent epimerase/dehydratase family amg_database Roux et al. 2016 False

Sincerely

diego00012138 commented 2 weeks ago

@rmFlynn I would be very grateful if you could answer my questions! QAQ