Closed ReneRanzinger closed 8 months ago
This is correct in the monocomp.tsv file (Xxx = 1) and in the monocounts.tsv file (Hex 1, HexNAc 1, Sia 1). I think the issue that that the loader ingesting the files isn't handing the Sia to Xxx mapping.
@rykahsay Glycan detail API: "composition" for G07801AF is missing Sialic acid, but it's present in "composition_expanded."
grep "G07801AF" reviewed/glycan_monosaccharide_composition.csv
"glytoucan_ac","Hex","HexNAc","dHex","NeuAc","NeuGc","HexA","HexN","S","P","aldi","Xxx","X","Count"
"G07801AF","1","1","0","0","0","0","0","0","0","0","1","0","3"
grep "G07801AF" reviewed/glycan_monosaccharide_composition_advanced.csv
"glytoucan_ac","Fuc","Fuc+aldi","Gal","Gal+aldi","GalA","GalN","GalNAc","GalNAc+aldi","Glc","Glc+aldi","GlcA","GlcN","GlcNAc","GlcNAc+aldi","Hex","Hex+aldi","HexA","HexN","HexNAc","HexNAc+aldi","IdoA","Kdn","Man","Man+aldi","ManN","ManNAc","Me","NeuAc","NeuGc","P","Pent","S","Sia","X","Xxx","Xyl","aldi","dHex","dHex+aldi","Count"
"G07801AF","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","0","0","0","3"
https://api.tst.glygen.org/glycan/detail/G07801AF
"composition": [
{
"name": "Hexose",
"residue": "hex",
"count": 1,
"cid": "206",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/206"
},
{
"name": "N-Acetylhexosamine",
"residue": "hexnac",
"count": 1,
"cid": "899",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/899"
}
],
"composition_expanded": [
{
"name": "Alditol",
"residue": "aldi",
"count": 0
},
{
"name": "6-Deoxy-Hexose",
"residue": "dhex",
"count": 0,
"cid": "840",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/840"
},
{
"name": "6-Deoxy-Hexitol",
"residue": "dhex+aldi",
"count": 0
},
{
"name": "L-Fucose",
"residue": "fuc",
"count": 0,
"cid": "17106",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/17106"
},
{
"name": "L-Fucitol",
"residue": "fuc+aldi",
"count": 0,
"cid": "445724",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/445724"
},
{
"name": "D-Galactose",
"residue": "gal",
"count": 0,
"cid": "6036",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/6036"
},
{
"name": "D-Galacitol",
"residue": "gal+aldi",
"count": 0,
"cid": "11850",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/11850"
},
{
"name": "D-Galactosamine",
"residue": "galn",
"count": 0,
"cid": "24154",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/24154"
},
{
"name": "N-Acetyl-D-galactosamine",
"residue": "galnac",
"count": 0,
"cid": "35717",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/35717"
},
{
"name": "N-Acetylgalactosaminitol",
"residue": "galnac+aldi",
"count": 0,
"cid": "165880",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/165880"
},
{
"name": "D-Galacturonic Acid",
"residue": "gala",
"count": 0,
"cid": "439215",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/439215"
},
{
"name": "D-Glucose",
"residue": "glc",
"count": 0,
"cid": "5793",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/5793"
},
{
"name": "D-Glucitol",
"residue": "glc+aldi",
"count": 0,
"cid": "5780",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/5780"
},
{
"name": "D-Glucuronic acid",
"residue": "glca",
"count": 0,
"cid": "94715",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/94715"
},
{
"name": "D-Glucosamine",
"residue": "glcn",
"count": 0,
"cid": "439213",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/439213"
},
{
"name": "N-Acetyl-D-Glucosamine",
"residue": "glcnac",
"count": 0,
"cid": "439174",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/439174"
},
{
"name": "N-Acetyl-D-glucosaminitol",
"residue": "glcnac+aldi",
"count": 0,
"cid": "165206",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/165206"
},
{
"name": "Hexose",
"residue": "hex",
"count": 1,
"cid": "206",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/206"
},
{
"name": "N-Acetylhexosamine",
"residue": "hexnac",
"count": 1,
"cid": "899",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/899"
},
{
"name": "Hexuronic Acid",
"residue": "hexa",
"count": 0,
"cid": "610",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/610"
},
{
"name": "2-Amino-2-Deoxy-Hexose",
"residue": "hexn",
"count": 0,
"cid": "739",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/739"
},
{
"name": "Hexitol",
"residue": "hex+aldi",
"count": 0,
"cid": "453",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/453"
},
{
"name": "N-Acetylhexosaminitol",
"residue": "hexnac+aldi",
"count": 0
},
{
"name": "L-Iduronic acid",
"residue": "idoa",
"count": 0,
"cid": "441039",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/441039"
},
{
"name": "D-Mannose",
"residue": "man",
"count": 0,
"cid": "18950",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/18950"
},
{
"name": "D-Mannitol",
"residue": "man+aldi",
"count": 0,
"cid": "6251",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/6251"
},
{
"name": "D-Mannosamine",
"residue": "mann",
"count": 0,
"cid": "440049",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/440049"
},
{
"name": "N-Acetyl-D-mannosamine",
"residue": "mannac",
"count": 0,
"cid": "439281",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/439281"
},
{
"name": "Methyl",
"residue": "me",
"count": 0
},
{
"name": "N-Acetyl-Neuraminic Acid",
"residue": "neuac",
"count": 0,
"cid": "439197",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/439197"
},
{
"name": "N-Glycolyl-Neuraminic Acid",
"residue": "neugc",
"count": 0,
"cid": "440001",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/440001"
},
{
"name": "Phosphate",
"residue": "p",
"count": 0,
"cid": "1061",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/1061"
},
{
"name": "Pentose",
"residue": "pent",
"count": 0,
"cid": "229",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/229"
},
{
"name": "Sulfate",
"residue": "s",
"count": 0,
"cid": "1117",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/1117"
},
{
"name": "Sialic acid",
"residue": "sia",
"count": 1,
"cid": "906",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/906"
},
{
"name": "Other",
"residue": "other",
"count": 0
},
{
"name": "Floating substituent",
"residue": "x",
"count": 0
},
{
"name": "D-Xylose",
"residue": "xyl",
"count": 0,
"cid": "135191",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/135191"
},
{
"name": "Ketodeoxynononic acid",
"residue": "kdn",
"count": 0,
"cid": "13991616",
"url": "https://pubchem.ncbi.nlm.nih.gov/compound/13991616"
}
],
@kmartinez834 and @rykahsay I thought xxx would become "Other" in the composition display. Am I wrong?
Yes, but something weird is happening with this one too: https://glygen.org/glycan/G99968JX
Should have 2 monosaccharides but composition is
grep "G99968JX" reviewed/glycan_monosaccharide_composition.csv
"glytoucan_ac","Hex","HexNAc","dHex","NeuAc","NeuGc","HexA","HexN","S","P","aldi","Xxx","X","Count"
"G99968JX","0","1","0","0","0","0","0","1","0","0","1","0","2"
G99968JX is actually OK. If you look at the cartoon:
Ok I was looking at "Count" thinking that was the total number
This is the mapping I used to collapse and get the object list in "composition" property. Does this mean Xxx is collapsed to "Sia"? Please edit this to show the right mapping
{
"Hex":["Man","Gal","Glc","Hex"],
"HexNAc":["GalNAc","GlcNAc","ManNAc","HexNAc"],
"dHex":["Fuc","dHex"],
"Pent":["Xyl","Pent"],
"HexA":["GlcA","GalA","IdoA","ManA","HexA"],
"HexN":["GlcN","GalN","ManN","HexN"],
"NeuAc":["NeuAc"],
"NeuGc":["NeuGc"],
"S":["S"],
"P":["P"],
"Xxx":["Xxx"]
}
@rykahsay "Sia" collapses to "Xxx" only when it isn't mapped to "NeuAc" or "NeuGc"...
So if (# Sia) > (# NeuAc + # NeuGc), the difference is mapped to Xxx
Ex. G76100HQ has 2 Sia, 1 NeuGc and 1 NeuAc so nothing is mapped to "Other" G07801AF has 1 Sia, 0 NeuGc and 0 NeuAc so it should have 1 "Other"
grep "glytoucan\|G76100HQ\|G07801AF" reviewed/glycan_monosaccharide_composition_advanced.csv
"glytoucan_ac","Fuc","Fuc+aldi","Gal","Gal+aldi","GalA","GalN","GalNAc","GalNAc+aldi","Glc","Glc+aldi","GlcA","GlcN","GlcNAc","GlcNAc+aldi","Hex","Hex+aldi","HexA","HexN","HexNAc","HexNAc+aldi","IdoA","Kdn","Man","Man+aldi","ManN","ManNAc","Me","NeuAc","NeuGc","P","Pent","S","Sia","X","Xxx","Xyl","aldi","dHex","dHex+aldi","Count"
"G76100HQ","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","1","0","0","0","0","0","0","0","0","1","1","0","0","0","2","0","0","0","0","0","0","4"
"G07801AF","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","0","0","0","3"
Since this can't be represented in the mapping scheme above, how do you want to handle it?
I have tried to implement, please check and make sure all cases are working
These all look good
This was used as an example in our PubChem meeting today:
https://pubchem.ncbi.nlm.nih.gov/compound/210#section=Biologic-Description
https://www.glygen.org/glycan/G07801AF
The composition is wrong on both pages since its missing the sialic acid (no stereo-chemistry). Does this go back @edwardsnj or is that a problem on GlyGen side?