RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Conflation of "retina" with "rhodopsin", "retinaldehyde", and more #1940

Closed amykglen closed 1 year ago

amykglen commented 1 year ago

https://arax.rtx.ai/?term=UBERON:0000966

"UBERON:0000966": {
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:category",
              "attributes": null,
              "description": "Categories of all nodes in this synonym set in RTX-KG2.",
              "original_attribute_name": null,
              "value": [
                "biolink:NamedThing",
                "biolink:BiologicalEntity",
                "biolink:GrossAnatomicalStructure",
                "biolink:AnatomicalEntity",
                "biolink:InformationContentEntity",
                "biolink:Gene",
                "biolink:Protein",
                "biolink:Polypeptide",
                "biolink:ChemicalEntity",
                "biolink:SmallMolecule",
                "biolink:MolecularEntity",
                "biolink:Drug"
              ],
              "value_type_id": "metatype:Uriorcurie",
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:synonym",
              "attributes": null,
              "description": "Names of all nodes in this synonym set in RTX-KG2.",
              "original_attribute_name": null,
              "value": [
                "retinal",
                "Rho (rat)",
                "retina",
                "11-cis-Retinaldehyde",
                "11cRAL [cytosol]",
                "atRAL [plasma membrane]",
                "ISO RHODOPSIN",
                "rhodopsin (chicken)",
                "RHO (cow)",
                "RHO gene",
                "Retinal",
                "13-cis-Retinal",
                "11-cis-retinal",
                "rhodopsin (human)",
                "RHO [Golgi membrane]",
                "RHO",
                "9-cis-Retinal",
                "RETINAL",
                "Rhodopsin",
                "p-S334,338,343-RHO [photoreceptor disc membrane]",
                "Retinaldehyde",
                "rhodopsin (mouse)",
                "RHODOPSIN",
                "retinaldehyde",
                "rhodopsin",
                "all-trans-retinal",
                "Rho (mouse)",
                "9-cis-retinal",
                "Structure of ora serrata of retina",
                "atRAL [photoreceptor disc membrane]",
                "RHO [Golgi-associated vesicle membrane]",
                "Rhodopsin, Human",
                "Rho",
                "Rho (Bacillus subtilis subsp. subtilis str. 168)",
                "transcription termination factor Rho (Bacillus subtilis subsp. subtilis str. 168)",
                "RHO (chicken)",
                "RHO [ciliary membrane]",
                "RHO Gene",
                "atRAL [mitochondrial intermembrane space]",
                "9cRAL [cytosol]",
                "rhodopsin (cow)",
                "rhodopsin (rat)",
                "rhodopsin (zebrafish)",
                "13-cis-retinal",
                "protein rhomboid (fruit fly)",
                "SID17389450",
                "11-cis-Retinal",
                "RHO [photoreceptor disc membrane]",
                "p-S334,338,343-at-retinyl-RHO [photoreceptor disc membrane]",
                "atRAL [cytosol]",
                "11c-retinyl-RHO [photoreceptor disc membrane]",
                "Retina",
                "RHO (human)",
                "at-retinyl-RHO [photoreceptor disc membrane]",
                "Entire retina"
              ],
              "value_type_id": "metatype:String",
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:xref",
              "attributes": null,
              "description": "Identifiers of all nodes in this synonym set in RTX-KG2.",
              "original_attribute_name": null,
              "value": [
                "UMLS:C0035331",
                "REACT:R-HSA-5623404",
                "PR:P02699",
                "REACT:R-ALL-2466098",
                "PathWhiz.Compound:1048",
                "PR:P51489",
                "PR:000001245",
                "UMLS:C0085717",
                "KEGG.GLYCAN:G00412",
                "CHEMBL.COMPOUND:CHEMBL1579130",
                "HGNC:10012",
                "KEGG.COMPOUND:C16681",
                "REACT:R-ALL-975622",
                "CHEMBL.COMPOUND:CHEMBL1255087",
                "NCBIGene:509933",
                "UMLS:C0035499",
                "LOINC:LP15851-6",
                "UBERON:0000966",
                "MESH:C031390",
                "FMA:58301",
                "CHEBI:17898",
                "NCIT:C12343",
                "PathWhiz.ElementCollection:357",
                "UMLS:C1419385",
                "CHEBI:16066",
                "PR:P20350",
                "CHEBI:78273",
                "CHEMBL.COMPOUND:CHEMBL257381",
                "PathWhiz.Compound:1448",
                "NCIT:C87176",
                "KEGG.COMPOUND:C00778",
                "PSY:44430",
                "KEGG.COMPOUND:C02110",
                "MESH:D012160",
                "RGD:3573",
                "HMDB:HMDB0006218",
                "UniProtKB:P08100",
                "KEGG.COMPOUND:C00376",
                "REACT:R-ALL-5623648",
                "CHEMBL.TARGET:CHEMBL5739",
                "LOINC:LP30548-9",
                "HMDB:HMDB0001358",
                "UMLS:C0035298",
                "MESH:D012243",
                "REACT:R-ALL-32737",
                "NCBIGene:751791",
                "ENSEMBL:ENSG00000163914",
                "NCBIGene:937042",
                "PathWhiz.ElementCollection:59",
                "LOINC:MTHU013892",
                "REACT:R-HSA-5623406",
                "NCIT:C129078",
                "CHEMBL.COMPOUND:CHEMBL81379",
                "CHEMBL.TARGET:CHEMBL4296308",
                "OMIM:180380",
                "MGI:97914",
                "REACT:R-ALL-8960974",
                "REACT:R-HSA-5205901",
                "EHDAA2:0001627",
                "NDDF:016731",
                "HMDB:HMDB0006220",
                "PR:P15409",
                "FMA:67790",
                "REACT:R-HSA-5205903",
                "ORPHANET:118315",
                "NCIT:C129080",
                "UMLS:C0050210",
                "REACT:R-ALL-5362565",
                "HMDB:HMDB0002152",
                "NCIT:C68300",
                "REACT:R-HSA-5205900",
                "REACT:R-HSA-2581504",
                "PR:P35359",
                "CHEBI:45487",
                "UMLS:C0229196",
                "RXNORM:2268087",
                "REACT:R-HSA-5623400",
                "PR:P08100",
                "REACT:R-HSA-419802",
                "PSY:44575",
                "MESH:D012172",
                "PR:P22328",
                "PathWhiz.Compound:2667",
                "UMLS:C4283897",
                "NCBIGene:6010",
                "PR:Q03222",
                "REACT:R-ALL-30048",
                "LOINC:MTHU015065",
                "UMLS:C1278894"
              ],
              "value_type_id": "metatype:Nodeidentifier",
              "value_url": null
            },
        },
amykglen commented 1 year ago

this super-cluster appears to have been successfully broken up in the latest version of the new synonymizer (#2003):

Cluster for UMLS:C0035298 (UBERON:0000966) has 12 nodes:

id category name in_SRI in_KG2pre is_cluster_rep
CHEMBL.TARGET:CHEMBL4483122 AnatomicalEntity Retina X
EHDAA2:0001627 AnatomicalEntity retina X
FMA:58301 AnatomicalEntity Retina X
LOINC:LP30548-9 GrossAnatomicalStructure Retina X
LOINC:MTHU013892 GrossAnatomicalStructure Retinal X
LOINC:MTHU015065 GrossAnatomicalStructure Retina X
MESH:D012160 GrossAnatomicalStructure Retina X X
NCIT:C12343 GrossAnatomicalStructure Retina X X
NCIT:C87176 GrossAnatomicalStructure Retinal X X
PSY:44430 GrossAnatomicalStructure Retina X
UBERON:0000966 GrossAnatomicalStructure retina X X X
UMLS:C0035298 GrossAnatomicalStructure Retina X X

Cluster for HGNC:10012 (NCBIGene:6010) has 19 nodes:

id category name in_SRI in_KG2pre is_cluster_rep
ENSEMBL:ENSG00000163914 Gene RHO X X
ENSEMBL:ENSP00000296271 Protein X
ENSEMBL:ENSP00000296271.3 Protein X
HGNC:10012 Gene RHO X X
NCBIGene:6010 Gene RHO X X X
NCIT:C129078 Gene RHO Gene X
OMIM:180380 Gene RHO X X
PR:P08100 Protein rhodopsin (human) X X
REACT:R-HSA-2581504 Protein p-S334,338,343-RHO [photoreceptor disc membrane] X
REACT:R-HSA-419802 Protein RHO [photoreceptor disc membrane] X
REACT:R-HSA-5205900 Protein 11c-retinyl-RHO [photoreceptor disc membrane] X
REACT:R-HSA-5205901 Protein at-retinyl-RHO [photoreceptor disc membrane] X
REACT:R-HSA-5205903 Protein p-S334,338,343-at-retinyl-RHO [photoreceptor disc membrane] X
REACT:R-HSA-5623400 Protein RHO [Golgi membrane] X
REACT:R-HSA-5623404 Protein RHO [ciliary membrane] X
REACT:R-HSA-5623406 Protein RHO [Golgi-associated vesicle membrane] X
UMLS:C1419385 Gene RHO gene X X
UMLS:C4283897 Protein Rhodopsin, Human X X
UniProtKB:P08100 Protein OPSD_HUMAN Rhodopsin (sprot) X X

Cluster for UMLS:C0050210 (PUBCHEM.COMPOUND:6436082) has 9 nodes:

id category name in_SRI in_KG2pre is_cluster_rep
CAS:514-85-2 SmallMolecule X
CHEBI:78273 SmallMolecule 9-cis-retinal X X
CHEMBL.COMPOUND:CHEMBL257381 SmallMolecule ISO RHODOPSIN X X
GTOPDB:6673 SmallMolecule 9-cis-retinal X
INCHIKEY:NCYCYZXNIZJOKI-MKOSUFFBSA-N SmallMolecule X
MESH:C031390 SmallMolecule 9-cis-retinal X X
PUBCHEM.COMPOUND:6436082 SmallMolecule 9-cis-Retinal X X
PathWhiz.Compound:2667 SmallMolecule 9-cis-Retinal X
UMLS:C0050210 SmallMolecule 9-cis-retinal X X
amykglen commented 1 year ago

confirmed fixed on /test:

https://arax.ncats.io/test/?term=UBERON:0000966 https://arax.ncats.io/test/?term=NCBIGene:6010 https://arax.ncats.io/test/?term=PUBCHEM.COMPOUND:6436082