biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
9 stars 10 forks source link

Remove Gene SYMBOL-based operations, or add support for SYMBOL #308

Closed andrewsu closed 2 years ago

andrewsu commented 2 years ago

Results from running query D.1 on a local instance can be found here: https://www.dropbox.com/s/x78xatt48inggrz/response_d1.json?dl=0

Those results includes these two objects under message.knowledge_graph.nodes:

                "NCBIGene:120892": {
                    "categories": [
                        "biolink:Gene"
                    ],
                    "name": "LRRK2",
                    "attributes": [
                       ...
                    ]
                },
                "SYMBOL:LRRK2": {
                    "categories": [
                        "biolink:Gene"
                    ],
                    "name": "SYMBOL:LRRK2",
                    "attributes": [
                       ...
                    ]
                },

These two nodes should be combined as far as I can tell...

colleenXu commented 2 years ago

@andrewsu this is happening because some operations still use SYMBOL but the SRI-based ID resolver doesn't recognize and do ID resolution on SYMBOL.

We could remove those operations...

andrewsu commented 2 years ago

Right, thanks. Have we asked the SRI folks about whether adding symbol as another "identifier" for genes and proteins is possible?

colleenXu commented 2 years ago

@andrewsu we haven't...It looks like the biolink model adds symbol as a node property....https://github.com/biolink/biolink-model/blob/d77172050122bf4d5b48cd1d487fb58a8b163620/biolink-model.yaml#L8119

andrewsu commented 2 years ago

Some slack discussion here: https://ncatstranslator.slack.com/archives/C0125JTMDHA/p1632779182001700

I think within https://github.com/biothings/biomedical_id_resolver.js, we could hack in a bit of logic to take id.label from the Node Normalizer output and add an entry under equivalent_identifiers. Prioritization a bit depends on which API operations use gene symbols as their primary key. @colleenXu can you easily retrieve those?

colleenXu commented 2 years ago

@andrewsu

Only 1 API uses SYMBOL IDs: multiomics TCGA mutation freq API. This can really only be fixed on the data / parser side for that API.

I've updated all other APIs that did use SYMBOL to use another valid ID (NCBIGene, UniProtKB, ENSEMBL).....theoretically after refreshing the registry, only the API above will be giving SYMBOLs...

colleenXu commented 2 years ago

Note that BTE currently doesn't ingest the TCGA mutation freq API + the multiomics team have been informed of the issue and said they would remove SYMBOL / use a diff ID namespace for a future release

Therefore I'm closing this issue