TranslatorSRI / NameResolution

A service for finding CURIEs from lexical strings.
MIT License
3 stars 2 forks source link

ENSEMBL:ENSDARG00000111928 does not have a label (even though it should) #101

Open gaurav opened 1 year ago

gaurav commented 1 year ago

Reported by @Woozl

e.g.

curl -X 'POST' \
  'https://name-resolution-sri.renci.org/reverse_lookup' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"curies":["ENSEMBL:ENSDARG00000111928"]}'
gaurav commented 1 year ago

This is caused by us not having a preferred label for ENSEMBL:ENSDARG00000111928. I'm not sure if the right behavior should be to not have any identifiers in NameRes that don't have a preferred label (i.e. that this request should return something like "no such identifier") or if NameRes should have identifiers without preferred labels (i.e. this request should return something like "names": []). @cbizon Thoughts?

Here's the error:

  File "/home/nru/.local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/nru/.local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/nru/.local/lib/python3.9/site-packages/fastapi/routing.py", line 273, in app
    raw_response = await run_endpoint_function(
  File "/home/nru/.local/lib/python3.9/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
    return await dependant.call(**values)
  File "/repo/NameResolution/api/server.py", line 82, in lookup_names_post
    return await reverse_lookup(request.curies)
  File "/repo/NameResolution/api/server.py", line 105, in reverse_lookup
    output[doc["curie"]].extend(doc["names"])
cbizon commented 1 year ago

I prefer either returning an empty names array, or assigning a goofy preferred label (like just repeating the ID).

In this particular instance, it kind of feels like we should have a name? https://useast.ensembl.org/Danio_rerio/Gene/Summary?g=ENSDARG00000111928;r=CHR_ALT_CTG7_1_14:19886929-19896579 So it might be worth reviewing the ensembl loader.

gaurav commented 10 months ago

I prefer either returning an empty names array, or assigning a goofy preferred label (like just repeating the ID).

This is now no longer relevant, since now return an object rather than a list of entries. For example, looking up https://name-resolution-sri.renci.org/reverse_lookup?curies=ENSEMBL%3AENSDARG00000111928&curies=UBERON%3A8420000 will return:

{
  "ENSEMBL:ENSDARG00000111928": {},
  "UBERON:8420000": {
    "curie": "UBERON:8420000",
    "names": [
      "hair",
      "hair of scalp"
    ],
    "types": [
      "GrossAnatomicalStructure",
      "AnatomicalEntity",
      "PhysicalEssence",
      "OrganismalEntity",
      "SubjectOfInvestigation",
      "BiologicalEntity",
      "ThingWithTaxon",
      "NamedThing",
      "Entity",
      "PhysicalEssenceOrOccurrent"
    ],
    "preferred_name": "hair of scalp",
    "shortest_name_length": 4,
    "curie_suffix": 8420000,
    "id": "6516107d-e743-4e47-9207-065acfa0bb83",
    "_version_": 1781948985311756300
  }
}

So it is now pretty clear that we don't know about ENSEMBL:ENSDARG00000111928.

I'll leave this issue open until we figure out why ENSEMBL:ENSDARG00000111928 is missing a label in Babel, where we currently record it as:

{"type": "biolink:Gene", "identifiers": [{"i": "ENSEMBL:ENSDARG00000111928", "d": []}]}