glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Composition error in G07801AF #710

Closed ReneRanzinger closed 8 months ago

ReneRanzinger commented 8 months ago

This was used as an example in our PubChem meeting today:

https://pubchem.ncbi.nlm.nih.gov/compound/210#section=Biologic-Description

https://www.glygen.org/glycan/G07801AF

The composition is wrong on both pages since its missing the sialic acid (no stereo-chemistry). Does this go back @edwardsnj or is that a problem on GlyGen side?

edwardsnj commented 8 months ago

This is correct in the monocomp.tsv file (Xxx = 1) and in the monocounts.tsv file (Hex 1, HexNAc 1, Sia 1). I think the issue that that the loader ingesting the files isn't handing the Sia to Xxx mapping.

kmartinez834 commented 8 months ago

@rykahsay Glycan detail API: "composition" for G07801AF is missing Sialic acid, but it's present in "composition_expanded."

grep "G07801AF" reviewed/glycan_monosaccharide_composition.csv
"glytoucan_ac","Hex","HexNAc","dHex","NeuAc","NeuGc","HexA","HexN","S","P","aldi","Xxx","X","Count"
"G07801AF","1","1","0","0","0","0","0","0","0","0","1","0","3"
grep "G07801AF" reviewed/glycan_monosaccharide_composition_advanced.csv 
"glytoucan_ac","Fuc","Fuc+aldi","Gal","Gal+aldi","GalA","GalN","GalNAc","GalNAc+aldi","Glc","Glc+aldi","GlcA","GlcN","GlcNAc","GlcNAc+aldi","Hex","Hex+aldi","HexA","HexN","HexNAc","HexNAc+aldi","IdoA","Kdn","Man","Man+aldi","ManN","ManNAc","Me","NeuAc","NeuGc","P","Pent","S","Sia","X","Xxx","Xyl","aldi","dHex","dHex+aldi","Count"
"G07801AF","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","0","0","0","3"

https://api.tst.glygen.org/glycan/detail/G07801AF

"composition": [
    {
      "name": "Hexose",
      "residue": "hex",
      "count": 1,
      "cid": "206",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/206"
    },
    {
      "name": "N-Acetylhexosamine",
      "residue": "hexnac",
      "count": 1,
      "cid": "899",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/899"
    }
  ],
  "composition_expanded": [
    {
      "name": "Alditol",
      "residue": "aldi",
      "count": 0
    },
    {
      "name": "6-Deoxy-Hexose",
      "residue": "dhex",
      "count": 0,
      "cid": "840",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/840"
    },
    {
      "name": "6-Deoxy-Hexitol",
      "residue": "dhex+aldi",
      "count": 0
    },
    {
      "name": "L-Fucose",
      "residue": "fuc",
      "count": 0,
      "cid": "17106",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/17106"
    },
    {
      "name": "L-Fucitol",
      "residue": "fuc+aldi",
      "count": 0,
      "cid": "445724",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/445724"
    },
    {
      "name": "D-Galactose",
      "residue": "gal",
      "count": 0,
      "cid": "6036",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/6036"
    },
    {
      "name": "D-Galacitol",
      "residue": "gal+aldi",
      "count": 0,
      "cid": "11850",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/11850"
    },
    {
      "name": "D-Galactosamine",
      "residue": "galn",
      "count": 0,
      "cid": "24154",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/24154"
    },
    {
      "name": "N-Acetyl-D-galactosamine",
      "residue": "galnac",
      "count": 0,
      "cid": "35717",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/35717"
    },
    {
      "name": "N-Acetylgalactosaminitol",
      "residue": "galnac+aldi",
      "count": 0,
      "cid": "165880",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/165880"
    },
    {
      "name": "D-Galacturonic Acid",
      "residue": "gala",
      "count": 0,
      "cid": "439215",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/439215"
    },
    {
      "name": "D-Glucose",
      "residue": "glc",
      "count": 0,
      "cid": "5793",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/5793"
    },
    {
      "name": "D-Glucitol",
      "residue": "glc+aldi",
      "count": 0,
      "cid": "5780",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/5780"
    },
    {
      "name": "D-Glucuronic acid",
      "residue": "glca",
      "count": 0,
      "cid": "94715",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/94715"
    },
    {
      "name": "D-Glucosamine",
      "residue": "glcn",
      "count": 0,
      "cid": "439213",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/439213"
    },
    {
      "name": "N-Acetyl-D-Glucosamine",
      "residue": "glcnac",
      "count": 0,
      "cid": "439174",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/439174"
    },
    {
      "name": "N-Acetyl-D-glucosaminitol",
      "residue": "glcnac+aldi",
      "count": 0,
      "cid": "165206",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/165206"
    },
    {
      "name": "Hexose",
      "residue": "hex",
      "count": 1,
      "cid": "206",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/206"
    },
    {
      "name": "N-Acetylhexosamine",
      "residue": "hexnac",
      "count": 1,
      "cid": "899",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/899"
    },
    {
      "name": "Hexuronic Acid",
      "residue": "hexa",
      "count": 0,
      "cid": "610",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/610"
    },
    {
      "name": "2-Amino-2-Deoxy-Hexose",
      "residue": "hexn",
      "count": 0,
      "cid": "739",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/739"
    },
    {
      "name": "Hexitol",
      "residue": "hex+aldi",
      "count": 0,
      "cid": "453",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/453"
    },
    {
      "name": "N-Acetylhexosaminitol",
      "residue": "hexnac+aldi",
      "count": 0
    },
    {
      "name": "L-Iduronic acid",
      "residue": "idoa",
      "count": 0,
      "cid": "441039",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/441039"
    },
    {
      "name": "D-Mannose",
      "residue": "man",
      "count": 0,
      "cid": "18950",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/18950"
    },
    {
      "name": "D-Mannitol",
      "residue": "man+aldi",
      "count": 0,
      "cid": "6251",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/6251"
    },
    {
      "name": "D-Mannosamine",
      "residue": "mann",
      "count": 0,
      "cid": "440049",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/440049"
    },
    {
      "name": "N-Acetyl-D-mannosamine",
      "residue": "mannac",
      "count": 0,
      "cid": "439281",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/439281"
    },
    {
      "name": "Methyl",
      "residue": "me",
      "count": 0
    },
    {
      "name": "N-Acetyl-Neuraminic Acid",
      "residue": "neuac",
      "count": 0,
      "cid": "439197",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/439197"
    },
    {
      "name": "N-Glycolyl-Neuraminic Acid",
      "residue": "neugc",
      "count": 0,
      "cid": "440001",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/440001"
    },
    {
      "name": "Phosphate",
      "residue": "p",
      "count": 0,
      "cid": "1061",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/1061"
    },
    {
      "name": "Pentose",
      "residue": "pent",
      "count": 0,
      "cid": "229",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/229"
    },
    {
      "name": "Sulfate",
      "residue": "s",
      "count": 0,
      "cid": "1117",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/1117"
    },
    {
      "name": "Sialic acid",
      "residue": "sia",
      "count": 1,
      "cid": "906",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/906"
    },
    {
      "name": "Other",
      "residue": "other",
      "count": 0
    },
    {
      "name": "Floating substituent",
      "residue": "x",
      "count": 0
    },
    {
      "name": "D-Xylose",
      "residue": "xyl",
      "count": 0,
      "cid": "135191",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/135191"
    },
    {
      "name": "Ketodeoxynononic acid",
      "residue": "kdn",
      "count": 0,
      "cid": "13991616",
      "url": "https://pubchem.ncbi.nlm.nih.gov/compound/13991616"
    }
  ],
ReneRanzinger commented 8 months ago

@kmartinez834 and @rykahsay I thought xxx would become "Other" in the composition display. Am I wrong?

kmartinez834 commented 8 months ago

Yes, but something weird is happening with this one too: https://glygen.org/glycan/G99968JX

Should have 2 monosaccharides but composition is image

grep "G99968JX" reviewed/glycan_monosaccharide_composition.csv 
"glytoucan_ac","Hex","HexNAc","dHex","NeuAc","NeuGc","HexA","HexN","S","P","aldi","Xxx","X","Count"
"G99968JX","0","1","0","0","0","0","0","1","0","0","1","0","2"
ReneRanzinger commented 8 months ago

G99968JX is actually OK. If you look at the cartoon:

kmartinez834 commented 8 months ago

Ok I was looking at "Count" thinking that was the total number

rykahsay commented 8 months ago

This is the mapping I used to collapse and get the object list in "composition" property. Does this mean Xxx is collapsed to "Sia"? Please edit this to show the right mapping

{
    "Hex":["Man","Gal","Glc","Hex"],
    "HexNAc":["GalNAc","GlcNAc","ManNAc","HexNAc"],
    "dHex":["Fuc","dHex"],
    "Pent":["Xyl","Pent"],
    "HexA":["GlcA","GalA","IdoA","ManA","HexA"],
    "HexN":["GlcN","GalN","ManN","HexN"],
    "NeuAc":["NeuAc"],
    "NeuGc":["NeuGc"],
    "S":["S"],
    "P":["P"],
    "Xxx":["Xxx"]
}
kmartinez834 commented 8 months ago

@rykahsay "Sia" collapses to "Xxx" only when it isn't mapped to "NeuAc" or "NeuGc"...

So if (# Sia) > (# NeuAc + # NeuGc), the difference is mapped to Xxx

Ex. G76100HQ has 2 Sia, 1 NeuGc and 1 NeuAc so nothing is mapped to "Other" G07801AF has 1 Sia, 0 NeuGc and 0 NeuAc so it should have 1 "Other"

grep "glytoucan\|G76100HQ\|G07801AF" reviewed/glycan_monosaccharide_composition_advanced.csv
"glytoucan_ac","Fuc","Fuc+aldi","Gal","Gal+aldi","GalA","GalN","GalNAc","GalNAc+aldi","Glc","Glc+aldi","GlcA","GlcN","GlcNAc","GlcNAc+aldi","Hex","Hex+aldi","HexA","HexN","HexNAc","HexNAc+aldi","IdoA","Kdn","Man","Man+aldi","ManN","ManNAc","Me","NeuAc","NeuGc","P","Pent","S","Sia","X","Xxx","Xyl","aldi","dHex","dHex+aldi","Count"
"G76100HQ","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","1","0","0","0","0","0","0","0","0","1","1","0","0","0","2","0","0","0","0","0","0","4"
"G07801AF","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","1","0","0","0","0","0","0","3"

Since this can't be represented in the mapping scheme above, how do you want to handle it?

rykahsay commented 8 months ago

I have tried to implement, please check and make sure all cases are working

image
kmartinez834 commented 8 months ago

These all look good