biothings / biothings_explorer_archived

BioThings Explorer: a schema-based client for API interoperability
Apache License 2.0
14 stars 14 forks source link

"name:something" returned as identifier for pathway #181

Closed cbizon closed 3 years ago

cbizon commented 3 years ago

For this query (AS edited 2021-06-20 -- updating to TRAPI v1.1):

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:1017"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories" : ["biolink:Pathway"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

We get back a bunch of results, most of which are KEGG or REACT or WIKIPATHWAY, but we also get back these, which look like an error, or at least are not biolink compliant ids:

name:cyclins and cell cycle regulation
name:estrogen responsive protein efp controls cell cycle and breast tumors growth
name:cyclin e destruction pathway
name:cell cycle: g1/s check point
name:cdk regulation of dna replication
name:regulation of p27 phosphorylation during cell cycle progression
name:p53 signaling pathway
name:influence of ras and rho proteins on g1 to s transition
name:rb tumor suppressor/checkpoint signaling in response to dna damage
name:e2f1 destruction pathway
andrewsu commented 3 years ago

It looks like these are coming from biocarta / reactome / etc but proxied through ConsensusPathDB and mygene.info. @colleenXu it looks like you added the relevant lines to https://github.com/NCATS-Tangerine/translator-api-registry/blame/master/mygene.info/openapi_full.yml -- do you have any insights here to share?

Example edge:

{
  "predicate": "biolink:participates_in",
  "subject": "NCBIGene:1017",
  "object": "name:e2f1 destruction pathway",
  "attributes": [
    {
      "name": "provided_by",
      "value": [
        "ConsensusPathDB"
      ],
      "type": "biolink:provided_by"
    },
    {
      "name": "api",
      "value": [
        "MyGene.info API"
      ],
      "type": "bts:api"
    },
    {
      "name": "publications",
      "value": [],
      "type": "biolink:publication"
    }
  ]
}
colleenXu commented 3 years ago

@andrewsu I believe these are BIOCARTA pathway names. This is likely because the BIOCARTA ID is not included in the id ranking for Pathway entities here.

Including it in the id ranking before "name" will likely fix this issue.

andrewsu commented 3 years ago

@marcodarko do you have in your dev environment an easy way to test whether @colleenXu's pull request fixes the issue? Specifically, in response to the TRAPI query in the first comment, we should see no nodes with name: as the prefix.

marcodarko commented 3 years ago

@andrewsu yes that seems to solve this issue, I'll make a PR to our code looks like this was made to Colleen's forked code.

marcodarko commented 3 years ago

Issue resolved on PR here https://github.com/biothings/biomedical_id_resolver.js/pull/55/commits/78f985e026d43eab856e93ecbe7e95b1a47aa3c7