WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

CAZy hits are really long #76

Closed ellasiera closed 2 years ago

ellasiera commented 3 years ago

Hey! First of all thank you for writing this awesome tool. I ran it on some genes and noticed that where there is a CAZy hit there is actually a super long list of them. For example, this is the CAZy hit for one gene and it's not even the longest: Glycosyl transferases group 1 [PF00534.20]; Glycosyl transferases group 1 [PF13692.6] GT4 sucrose synthase (EC 2.4.1.13); sucrose-phosphate synthase (EC 2.4.1.14); alpha-glucosyltransferase (EC 2.4.1.52); lipopolysaccharide N-acetylglucosaminyltransferase (EC 2.4.1.56); phosphatidylinositol alpha-mannosyltransferase (EC 2.4.1.57); GDP-Man: Man1GlcNAc2-PP-dolichol alpha-1,3-mannosyltransferase (EC 2.4.1.132); GDP-Man: Man3GlcNAc2-PP-dolichol/Man4GlcNAc2-PP-dolichol alpha-1,2-mannosyltransferase (EC 2.4.1.131); digalactosyldiacylglycerol synthase (EC 2.4.1.141); 1,2-diacylglycerol 3-glucosyltransferase (EC 2.4.1.157); diglucosyl diacylglycerol synthase (EC 2.4.1.208); trehalose phosphorylase (EC 2.4.1.231); NDP-Glc: alpha-glucose alpha-glucosyltransferase / alpha,alpha-trehalose synthase (EC 2.4.1.245); GDP-Man: Man2GlcNAc2-PP-dolichol alpha-1,6-mannosyltransferase (EC 2.4.1.257); UDP-GlcNAc: 2-deoxystreptamine alpha-N-acetylglucosaminyltransferase (EC 2.4.1.283); UDP-GlcNAc: ribostamycin alpha-N-acetylglucosaminyltransferase (EC 2.4.1.285); UDP-Gal alpha-galactosyltransferase (EC 2.4.1.-); UDP-Xyl alpha-xylosyltransferase (EC 2.4.2.-); UDP-GlcA alpha-glucuronyltransferase (EC 2.4.1.-); UDP-Glc alpha-glucosyltransferase (EC 2.4.1.-); UDP-GalNAc: GalNAc-PP-Und alpha-1,3-N-acetylgalactosaminyltransferase (EC 2.4.1.306); UDP-GalNAc: N,N'-diacetylbacillosaminyl-PP-Und alpha-1,3-N-acetylgalactosaminyltransferase (EC 2.4.1.290); ADP-dependent alpha-maltose-1-phosphate synthase (2.4.1.342) [GT4] What do I do with this? Break by ; and pick the first one?

shafferm commented 3 years ago

Great question! We give the full list because that is all that dbCAN2 tells us. This is the file that we build the output off of http://bcb.unl.edu/dbCAN2/download/Databases/CAZyDB.07312019.fam-activities.txt. If you search for GT4 you can find the right line and see that they give you all of those annotations! The way we interpret it is to only say that it is a part of family GT4 (http://www.cazy.org/GT4.html). I'd look at the other annotations of the gene and see if that gives a better clue if you want to give it a singular function.

shafferm commented 3 years ago

I asked Mikayla Borton (the DRAM metabolism expert) if she had anything to add to this and her suggestion was to use The distillate with the table of substrate assignments (this one in particular we didn't assign to anything). The CAZy's where we have assigned substrates will give you the more specific information you are looking for. CAZy just doesn't give that kind of succinct summary by default.

ellasiera commented 3 years ago

Got it. Thank you! The reason I'm not using distillate is that I wanted to use the DRAM to annotate DESeq2 results, and for that I need to keep the fasta headers. I might just go with the family annotation for now and figure out a single function later if the gene is differentially expressed.

shafferm commented 3 years ago

If you want to connect the two there is a flag when running distill that will put the gene names in the table instead of counts (--distillate_gene_names). We are working on improving making it easier to traverse from raw to distillate to product and back.

ellasiera commented 3 years ago

The flag is giving me an error DRAM.py: error: unrecognized arguments: --distillate_gene_names

shafferm commented 3 years ago

I forgot to respond to this, I'm so sorry! It sounds like you might have an older version of DRAM and you would need to upgrade to use this flag. You can find instructions on how most easily upgrade DRAM here.