Closed ellasiera closed 2 years ago
Great question! We give the full list because that is all that dbCAN2 tells us. This is the file that we build the output off of http://bcb.unl.edu/dbCAN2/download/Databases/CAZyDB.07312019.fam-activities.txt. If you search for GT4 you can find the right line and see that they give you all of those annotations! The way we interpret it is to only say that it is a part of family GT4 (http://www.cazy.org/GT4.html). I'd look at the other annotations of the gene and see if that gives a better clue if you want to give it a singular function.
I asked Mikayla Borton (the DRAM metabolism expert) if she had anything to add to this and her suggestion was to use The distillate with the table of substrate assignments (this one in particular we didn't assign to anything). The CAZy's where we have assigned substrates will give you the more specific information you are looking for. CAZy just doesn't give that kind of succinct summary by default.
Got it. Thank you! The reason I'm not using distillate is that I wanted to use the DRAM to annotate DESeq2 results, and for that I need to keep the fasta headers. I might just go with the family annotation for now and figure out a single function later if the gene is differentially expressed.
If you want to connect the two there is a flag when running distill
that will put the gene names in the table instead of counts (--distillate_gene_names
). We are working on improving making it easier to traverse from raw to distillate to product and back.
The flag is giving me an error DRAM.py: error: unrecognized arguments: --distillate_gene_names
Hey! First of all thank you for writing this awesome tool. I ran it on some genes and noticed that where there is a CAZy hit there is actually a super long list of them. For example, this is the CAZy hit for one gene and it's not even the longest: Glycosyl transferases group 1 [PF00534.20]; Glycosyl transferases group 1 [PF13692.6] GT4 sucrose synthase (EC 2.4.1.13); sucrose-phosphate synthase (EC 2.4.1.14); alpha-glucosyltransferase (EC 2.4.1.52); lipopolysaccharide N-acetylglucosaminyltransferase (EC 2.4.1.56); phosphatidylinositol alpha-mannosyltransferase (EC 2.4.1.57); GDP-Man: Man1GlcNAc2-PP-dolichol alpha-1,3-mannosyltransferase (EC 2.4.1.132); GDP-Man: Man3GlcNAc2-PP-dolichol/Man4GlcNAc2-PP-dolichol alpha-1,2-mannosyltransferase (EC 2.4.1.131); digalactosyldiacylglycerol synthase (EC 2.4.1.141); 1,2-diacylglycerol 3-glucosyltransferase (EC 2.4.1.157); diglucosyl diacylglycerol synthase (EC 2.4.1.208); trehalose phosphorylase (EC 2.4.1.231); NDP-Glc: alpha-glucose alpha-glucosyltransferase / alpha,alpha-trehalose synthase (EC 2.4.1.245); GDP-Man: Man2GlcNAc2-PP-dolichol alpha-1,6-mannosyltransferase (EC 2.4.1.257); UDP-GlcNAc: 2-deoxystreptamine alpha-N-acetylglucosaminyltransferase (EC 2.4.1.283); UDP-GlcNAc: ribostamycin alpha-N-acetylglucosaminyltransferase (EC 2.4.1.285); UDP-Gal alpha-galactosyltransferase (EC 2.4.1.-); UDP-Xyl alpha-xylosyltransferase (EC 2.4.2.-); UDP-GlcA alpha-glucuronyltransferase (EC 2.4.1.-); UDP-Glc alpha-glucosyltransferase (EC 2.4.1.-); UDP-GalNAc: GalNAc-PP-Und alpha-1,3-N-acetylgalactosaminyltransferase (EC 2.4.1.306); UDP-GalNAc: N,N'-diacetylbacillosaminyl-PP-Und alpha-1,3-N-acetylgalactosaminyltransferase (EC 2.4.1.290); ADP-dependent alpha-maltose-1-phosphate synthase (2.4.1.342) [GT4] What do I do with this? Break by ; and pick the first one?