ExposuresProvider / cam-pipeline

Data loading pipeline for CAM database
https://exposuresprovider.github.io/cam-pipeline/
MIT License
2 stars 4 forks source link

`biolink:same_as` connects Automat CAM-KP nodes with themselves #92

Open gaurav opened 1 year ago

gaurav commented 1 year ago

For example, in the query:

{"message":{"query_graph":{"nodes":{"n0":{"categories":["biolink:BiologicalProcessOrActivity"]},"n1":{"ids":["GO:0005737"],"categories":["biolink:AnatomicalEntity"]}},"edges":{"e0":{"predicates":["biolink:occurs_in"],"subject":"n0","object":"n1"}}}}}

We get back the result:

        "GO:0008494": {
          "categories": [
            "biolink:MolecularActivity",
            "biolink:OntologyClass",
            "biolink:BiologicalEntity",
            "biolink:BiologicalProcessOrActivity",
            "biolink:Entity",
            "biolink:ThingWithTaxon",
            "biolink:NamedThing",
            "biolink:PhysicalEssenceOrOccurrent",
            "biolink:Occurrent"
          ],
          "name": "translation activator activity",
          "attributes": [
            {
              "attribute_type_id": "biolink:same_as",
              "value": [
                "GO:0008494"
              ],
              "value_type_id": "metatype:uriorcurie",
              "original_attribute_name": "equivalent_identifiers",
              "value_url": null,
              "attribute_source": null,
              "description": null,
              "attributes": null
            }
          ]
        }

This is not wrong, just unnecessary. This might be deliberate, so a client can figure out the identifier without looking at the key, and I'm not sure if the problem is in the cam-pipeline output or ORION. Just wanted to document this here so we can think about this later on.

gaurav commented 1 year ago

I think this might be from NodeNorm output; for example, we can get a result like:

        "UniProtKB:Q80VY9-1": {
          "categories": [
            "biolink:GeneProductMixin",
            "biolink:GeneOrGeneProduct",
            "biolink:BiologicalEntity",
            "biolink:Entity",
            "biolink:ThingWithTaxon",
            "biolink:NamedThing",
            "biolink:ChemicalEntityOrGeneOrGeneProduct",
            "biolink:Protein",
            "biolink:ChemicalEntityOrProteinOrPolypeptide",
            "biolink:MacromolecularMachineMixin",
            "biolink:Polypeptide"
          ],
          "name": "ATP-dependent RNA helicase DHX33 isoform m1 (mouse)",
          "attributes": [
            {
              "attribute_type_id": "biolink:same_as",
              "value": [
                "UniProtKB:Q80VY9-1",
                "PR:Q80VY9-1"
              ],
              "value_type_id": "metatype:uriorcurie",
              "original_attribute_name": "equivalent_identifiers",
              "value_url": null,
              "attribute_source": null,
              "description": null,
              "attributes": null
            }
          ]
        }