Closed colleenXu closed 2 months ago
For development, use these yamls:
We'll add overrides to these yamls when we deploy this feature.
And WE ARE WAITING AND WON'T MERGE the SmartAPI yaml PRs until AFTER this feature is deployed to Prod:
After these SmartAPI yaml PRs are merged, overrides to this branch's yamls can be removed from BTE.
@colleenXu Do you have any example queries that would hit BindingDB and likely return source_url
with the above override?
@tokebe Sorry for seeing this late >.<
Start with CD47 (aka UniProtKB:Q08722 / NCBIGene:961) ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:Gene"], "ids": ["UniProtKB:Q08722"] }, "n1": { "categories": ["biolink:SmallMolecule"] } }, "edges": { "e01": { "subject": "n0", "object": "n1", "predicates": ["biolink:physically_interacts_with"] } } } } } ``` Currently this is one of the edges: ``` "627d6da60b47a3585f493321cb491e82": { "predicate": "biolink:physically_interacts_with", "subject": "NCBIGene:961", "object": "PUBCHEM.COMPOUND:155537282", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search", "PMID:31403795", "doi:10.1021/acs.jmedchem.9b00024" ], "value_type_id": "linkml:Uriorcurie" } ], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source" }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] }, ``` With the different handling for `source_url`, the long url should be in the primary-source-object instead ``` "627d6da60b47a3585f493321cb491e82": { "predicate": "biolink:physically_interacts_with", "subject": "NCBIGene:961", "object": "PUBCHEM.COMPOUND:155537282", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "PMID:31403795", "doi:10.1021/acs.jmedchem.9b00024" ], "value_type_id": "linkml:Uriorcurie" } ], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source", "source_record_urls": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search" ] }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] }, ```
Start with chemical INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:SmallMolecule"], "ids": ["INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N"] }, "n1": { "categories": ["biolink:Gene"] } }, "edges": { "e01": { "subject": "n0", "object": "n1", "predicates": ["biolink:physically_interacts_with"] } } } } } ``` Currently there's only 1 edge ``` "edges": { "00c844be0dd7da974fcb364b5bc9c1e0": { "predicate": "biolink:physically_interacts_with", "subject": "PUBCHEM.COMPOUND:134553288", "object": "NCBIGene:187", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search" ], "value_type_id": "linkml:Uriorcurie" } ], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source" }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] } } }, ``` With the different handling for `source_url`, the long url should be in the primary-source-object instead - so there'll be no edge attributes ``` "edges": { "00c844be0dd7da974fcb364b5bc9c1e0": { "predicate": "biolink:physically_interacts_with", "subject": "PUBCHEM.COMPOUND:134553288", "object": "NCBIGene:187", "attributes": [], "sources": [ { "resource_id": "infores:bindingdb", "resource_role": "primary_knowledge_source", "source_record_urls": [ "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search" ] }, { "resource_id": "infores:biothings-bindingdb", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:bindingdb" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-bindingdb" ] } ] } } }, ```
I've updated my comment above with all of the adjusted SmartAPI yamls for this feature. We'll add overrides to these yamls for this feature later.
I didn't adjust the MyGene reverse operation (geneToDisease
) to retrieve the url, because I encountered an issue with jmespath (comment, issue). I didn't encounter this issue when adjusting the BioThings rare source operation - probably because the value of raresource.disease field
is always an array (even if there's only 1 element).
I don't think this is a blocking issue for this feature though.
@tokebe
I've added a PR in bte-server to add the overrides needed https://github.com/biothings/bte-server/pull/25
Also: could we remove the source_record_urls field when it has no value? Right now it's in every primary knowledge source object, when it'll only be filled in a few cases.
(based on a quick look)
Latest commit should fix this.
Note: we can keep the override to BioThings rare-source, but I reverted the branch's yaml so it's using ref_url (commit). AKA it's not using this feature anymore. These links provide related literature info, but don't explain where the original association came from (GARD). So they should be edge-attributes
We'd like to use source_url
for BioThings PFOCR's pfocrUrl field (Ref Slack discussion with @AlexanderPico and https://github.com/NCATS-Tangerine/translator-api-registry/issues/132#issuecomment-2146215578).
However, we needed to adjust our "unique edge hashing" so a TRAPI edge could have multiple values in the source_record_urls
array. I think this is needed for our edges from BioThings PFOCR - since I don't think we want a separate edge for every subject/object/figure combo. Currently, we merge records so an edge can contain info from multiple figures with the same triple (subject/object entities).
We have two code changes that have been deployed to dev/CI -> and we want to patch to Test where the rest of the code for this feature is:
The BTE code + old overrides (having ncats_rare_source rather than pfocr) were deployed to Prod as part of the Octopus release. I tested and it's live.
However, the latest PFOCR stuff is only on dev/CI right now. I think we want to get this in as a patch to Test/Prod.
So I havne't merged the SmartAPI yaml PR yet, or added the override removal to the chore
See https://github.com/biothings/biothings_explorer/issues/811#issuecomment-2167365852 for details on how we're going to remove the overrides/deploy the PFOCR stuff.
Related PRs deployed to Prod.
I've updated BioThings PFOCR x-bte annotation to use this feature.
TRAPI 1.5 has updated documentation for "references", that says "urls to an external page for the association" should be put into the KG Edge's sources (in the source_record_urls field for one of the source objects), rather than the
biolink:publications
edge-attribute.We've gotten confirmation from the UI team that they're aware of this and plan to add support for it (Translator Slack link).
My implementation idea: use a different keyword in the response-mapping
ref_url
to tell BTE to add the field's values to thebiolink:publications
edge-attribute (previous issue)source_url
to tell BTE to use the field's values in the sources"resource_role": "primary_knowledge_source"
)source_record_urls
and value is an array of strings (make 1-element arrays for single urls) from the mapped response field.We'd do this for: