RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
38 stars 8 forks source link

Glucose returned as a gene. #223

Open cbizon opened 2 years ago

cbizon commented 2 years ago

Running this query against KGX2 (what genes are downregulated by Gefitanib?)

  "edges": {
    "e00": {
      "predicates": ["biolink:entity_negatively_regulates_entity"],
      "object": "n1",
      "subject": "n0"
    }
  },
  "nodes": {
    "n0": {
      "categories": [
        "biolink:SmallMolecule"
      ],
      "constraints": [],
      "ids": [
        "PUBCHEM.COMPOUND:123631"
      ],
      "name": "Gefitanib"
    },
    "n1": {
      "categories": [
        "biolink:Gene"
      ]
    }
  }
}```

Returns a few things like:

FMA:264827 Interleukin-8 FMA:264829 Interleukin-6 FMA:63891 Collagen FMA:67330 Vimentin FMA:82743 Glucose



I don't think that FMA is a valid biolink prefix for gene (or any other class).

But more to the point, Glucose is returned as a gene, when it is not a gene.
acevedol commented 1 year ago

@saramsey Do you mind looking over this with me when you are available? I'm not sure how to address the problem.

saramsey commented 1 year ago

I don't see glucose showing up in the current test version of ARAX (arax.ncats.io/test), when I run this query graph:

 {
  "edges": {
    "e00": {
      "predicates": ["biolink:affects"],
      "object": "n1",
      "subject": "n0"
    }
  },
  "nodes": {
    "n0": {
      "categories": [
        "biolink:SmallMolecule"
      ],
      "constraints": [],
      "ids": [
        "PUBCHEM.COMPOUND:123631"
      ],
      "name": "Gefitanib"
    },
    "n1": {
      "categories": [
        "biolink:Gene"
      ]
    }
  }
}

I also checked in KG2.8.0c using this Cypher command:

match (n) where 'FMA:82743' in n.equivalent_curies return n;

which returned a single KG2c node; the resulting JSON did not have any indication that it is mapped to biolink:Gene anymore. Here is the JSON for glucose, in KG2.8.0c:

{
  "iri": "http://purl.obolibrary.org/obo/FMA_82743",
  "name": "Glucose",
  "description": "Glucose, also known as D-glucose or dextrose, is a member of the class of compounds known as hexoses. Hexoses are monosaccharides in which the sugar unit is a is a six-carbon containing moiety. Glucose contains an aldehyde group and is therefore referred to as an aldohexose. The glucose molecule can exist in an open-chain (acyclic) and ring (cyclic) form, the latter being the result of an intramolecular reaction between the aldehyde C atom and the C-5 hydroxyl group to form an intramolecular hemiacetal. In aqueous solution, both forms are in equilibrium and at pH 7 the cyclic one is predominant. Glucose is a neutral, hydrophilic molecule that readily dissolves in water. It exists as a white crystalline powder. Glucose is the primary source of energy for almost all living organisms. As such, it is the most abundant monosaccharide and the most widely used aldohexose in living organisms. When not circulating freely in blood (in animals) or resin (in plants), glucose is stored as a polymer. In plants it is mainly stored as starch and amylopectin and in animals as glycogen. Glucose is produced by plants through the photosynthesis using sunlight, water and carbon dioxide where it is used as an energy and a carbon source Glucose is particularly abundant in fruits and other parts of plants in its free state. Foods that are particularly rich in glucose are honey, agave, molasses, apples (2g/100g), grapes (8g/100g), oranges (8.5g/100g), jackfruit, dried apricots, dates (32 g/100g), bananas (5.8 g/100g), grape juice, sweet corn, Glucose is about 75% as sweet as sucrose and about 50% as sweet as fructose. Sweetness is detected through the binding of sugars to the T1R3 and T1R2 proteins, to form a G-protein coupled receptor that is the sweetness receptor in mammals. Glucose was first isolated from raisins in 1747 by the German chemist Andreas Marggraf. It was discovered in grapes by Johann Tobias Lowitz in 1792 and recognized as different from cane sugar (sucrose). Industrially, glucose is mainly used for the production of fructose and in the production of glucose-containing foods. In foods, it is used as a sweetener, humectant, to increase the volume and to create a softer mouthfeel. Various sources of glucose, such as grape juice (for wine) or malt (for beer), are used for fermentation to ethanol during the production of alcoholic beverages. Glucose is found in many plants as glucosides. A glucoside is a glycoside that is derived from glucose. Glucosides are common in plants, but rare in animals. Glucose is produced when a glucoside is hydrolyzed by purely chemical means or decomposed by fermentation or enzymes. Glucose can be obtained by the hydrolysis of carbohydrates such as milk sugar (lactose), cane sugar (sucrose), maltose, cellulose, and glycogen. Glucose is a building block of the disaccharides lactose and sucrose (cane or beet sugar), of oligosaccharides such as raffinose and of polysaccharides such as starch and amylopectin, glycogen or cellulose. For most animals, while glucose is normally obtained from the diet, it can also be generated via gluconeogenesis. Gluconeogenesis is a metabolic pathway that results in the generation of glucose from certain non-carbohydrate carbon substrates. Gluconeogenesis is a ubiquitous process, present in plants, animals, fungi, bacteria, and other microorganisms. In vertebrates, gluconeogenesis takes place mainly in the liver and, to a lesser extent, in the cortex of the kidneys. In humans the main gluconeogenic precursors are lactate, glycerol (which is a part of the triacylglycerol molecule), alanine and glutamine.",
  "equivalent_curies": [
    "HMDB:HMDB0304632",
    "ATC:V04CA02",
    "NDDF:002597",
    "ATC:B05CX01",
    "PSY:21160",
    "LOINC:MTHU001675",
    "LOINC:LP14635-4",
    "VANDF:4019541",
    "ATC:V06DC01",
    "FMA:82743",
    "LOINC:LP32534-7",
    "RXNORM:4850",
    "NCIT:C2831"
  ],
  "id": "FMA:82743",
  "category": "biolink:BiologicalEntity",
  "all_names": [
    "Glucose",
    "glucose"
  ],
  "all_categories": [
    "biolink:ChemicalEntity",
    "biolink:BiologicalEntity",
    "biolink:Drug",
    "biolink:SmallMolecule"
  ]
}

I think we're good at this point? Please LMK if you are still seeing Glucose showing up as a "Gene" in results from RTX-KG2. Thanks.

saramsey commented 1 year ago

The picture from the Synonyms page on arax.ncats.io/test is also consistent:

Screen Shot 2023-02-07 at 4 07 10 PM
cbizon commented 1 year ago

The class looks good, thanks. I'm still unsure about the FMA prefix - I don't see it as a valid prefix for any classes in biolink. Maybe it needs to be added somewhere there, but if so, then we also need to include it in NodeNorm. I'm not actually sure what it is though.

edeutsch commented 1 year ago

FMA is a pretty important ontology I think. But it is mostly anatomy. A bit strange to have glucose. https://bioportal.bioontology.org/ontologies/FMA I'm thinking it should be supported by Biolink.