biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

TRAPI 1.5: support source_record_urls #803

Closed colleenXu closed 2 months ago

colleenXu commented 6 months ago

TRAPI 1.5 has updated documentation for "references", that says "urls to an external page for the association" should be put into the KG Edge's sources (in the source_record_urls field for one of the source objects), rather than the biolink:publications edge-attribute.

We've gotten confirmation from the UI team that they're aware of this and plan to add support for it (Translator Slack link).

My implementation idea: use a different keyword in the response-mapping

We'd do this for:

colleenXu commented 6 months ago

For development, use these yamls:

We'll add overrides to these yamls when we deploy this feature.


And WE ARE WAITING AND WON'T MERGE the SmartAPI yaml PRs until AFTER this feature is deployed to Prod:

After these SmartAPI yaml PRs are merged, overrides to this branch's yamls can be removed from BTE.

tokebe commented 6 months ago

@colleenXu Do you have any example queries that would hit BindingDB and likely return source_url with the above override?

colleenXu commented 6 months ago

@tokebe Sorry for seeing this late >.<

Query for "protein-ligand" operation

Start with CD47 (aka UniProtKB:Q08722 / NCBIGene:961) ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:Gene"], "ids": ["UniProtKB:Q08722"] }, "n1": { "categories": ["biolink:SmallMolecule"] } }, "edges": { "e01": { "subject": "n0", "object": "n1", "predicates": ["biolink:physically_interacts_with"] } } } } } ``` Currently this is one of the edges: ``` "627d6da60b47a3585f493321cb491e82": { "predicate": "biolink:physically_interacts_with", "subject": "NCBIGene:961", "object": "PUBCHEM.COMPOUND:155537282", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search", "PMID:31403795", "doi:10.1021/acs.jmedchem.9b00024" ], "value_type_id": "linkml:Uriorcurie" } ], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source" }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] }, ``` With the different handling for `source_url`, the long url should be in the primary-source-object instead ``` "627d6da60b47a3585f493321cb491e82": { "predicate": "biolink:physically_interacts_with", "subject": "NCBIGene:961", "object": "PUBCHEM.COMPOUND:155537282", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "PMID:31403795", "doi:10.1021/acs.jmedchem.9b00024" ], "value_type_id": "linkml:Uriorcurie" } ], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source", "source_record_urls": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search" ] }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] }, ```

Query for "ligand-protein" operation

Start with chemical INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:SmallMolecule"], "ids": ["INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N"] }, "n1": { "categories": ["biolink:Gene"] } }, "edges": { "e01": { "subject": "n0", "object": "n1", "predicates": ["biolink:physically_interacts_with"] } } } } } ``` Currently there's only 1 edge ``` "edges": { "00c844be0dd7da974fcb364b5bc9c1e0": { "predicate": "biolink:physically_interacts_with", "subject": "PUBCHEM.COMPOUND:134553288", "object": "NCBIGene:187", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search" ], "value_type_id": "linkml:Uriorcurie" } ], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source" }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] } } }, ``` With the different handling for `source_url`, the long url should be in the primary-source-object instead - so there'll be no edge attributes ``` "edges": { "00c844be0dd7da974fcb364b5bc9c1e0": { "predicate": "biolink:physically_interacts_with", "subject": "PUBCHEM.COMPOUND:134553288", "object": "NCBIGene:187", "attributes": [], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source", "source_record_urls": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search" ] }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] } } }, ```

colleenXu commented 6 months ago

I've updated my comment above with all of the adjusted SmartAPI yamls for this feature. We'll add overrides to these yamls for this feature later.

I didn't adjust the MyGene reverse operation (geneToDisease) to retrieve the url, because I encountered an issue with jmespath (comment, issue). I didn't encounter this issue when adjusting the BioThings rare source operation - probably because the value of raresource.disease field is always an array (even if there's only 1 element).

I don't think this is a blocking issue for this feature though.

colleenXu commented 5 months ago

@tokebe

I've added a PR in bte-server to add the overrides needed https://github.com/biothings/bte-server/pull/25

Also: could we remove the source_record_urls field when it has no value? Right now it's in every primary knowledge source object, when it'll only be filled in a few cases.

(based on a quick look)

tokebe commented 5 months ago

Latest commit should fix this.

colleenXu commented 5 months ago

Note: we can keep the override to BioThings rare-source, but I reverted the branch's yaml so it's using ref_url (commit). AKA it's not using this feature anymore. These links provide related literature info, but don't explain where the original association came from (GARD). So they should be edge-attributes

colleenXu commented 4 months ago

We'd like to use source_url for BioThings PFOCR's pfocrUrl field (Ref Slack discussion with @AlexanderPico and https://github.com/NCATS-Tangerine/translator-api-registry/issues/132#issuecomment-2146215578).

However, we needed to adjust our "unique edge hashing" so a TRAPI edge could have multiple values in the source_record_urls array. I think this is needed for our edges from BioThings PFOCR - since I don't think we want a separate edge for every subject/object/figure combo. Currently, we merge records so an edge can contain info from multiple figures with the same triple (subject/object entities).


We have two code changes that have been deployed to dev/CI -> and we want to patch to Test where the rest of the code for this feature is:

colleenXu commented 3 months ago

The BTE code + old overrides (having ncats_rare_source rather than pfocr) were deployed to Prod as part of the Octopus release. I tested and it's live.

However, the latest PFOCR stuff is only on dev/CI right now. I think we want to get this in as a patch to Test/Prod.


So I havne't merged the SmartAPI yaml PR yet, or added the override removal to the chore

colleenXu commented 3 months ago

See https://github.com/biothings/biothings_explorer/issues/811#issuecomment-2167365852 for details on how we're going to remove the overrides/deploy the PFOCR stuff.

tokebe commented 2 months ago

Related PRs deployed to Prod.

colleenXu commented 2 months ago

I've updated BioThings PFOCR x-bte annotation to use this feature.