biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
9 stars 10 forks source link

annotate Complex Portal API #631

Closed andrewsu closed 6 months ago

andrewsu commented 1 year ago

Website: https://www.ebi.ac.uk/complexportal/home Publication: https://academic.oup.com/nar/article/50/D1/D578/6414048 Description: The Complex Portal is a manually curated, encyclopaedic resource of macromolecular complexes from a number of key model organisms. The majority of complexes are made up of proteins but may also include nucleic acids or small molecules.

The API is described at https://www.ebi.ac.uk/intact/complex-ws/search/. Example API calls:

Note also that there are Complex - Disease annotations:

Note also that the web site also shows links to related pathways. For example, on https://www.ebi.ac.uk/complexportal/complex/CPX-2158, we see the content below.

image

However, these mappings are not currently available through the API. I've emailed the Complex Portal folks to see if they are willing/able to modify/extend the API around diseases and pathways to make it more easily accessible to BTE.

rjawesome commented 1 year ago

I am working on a yaml. Seems like it may need some jq post processing to differentiate between proteins and chemicals

rjawesome commented 1 year ago

ComplexPortal SmartAPI Yaml (uses jq processing): https://gist.github.com/rjawesome/020f3013a648f42e8326ba8df5a4f637 Supports Complex -> Chemical/Disease/Protein, and Chemical/Disease/Protein -> Complex (when I was testing, I needed to specify the category "biolink:MacromolecularComplex" on the complex)

colleenXu commented 9 months ago

Related infores stuff is ready but not deployed yet:

colleenXu commented 8 months ago

Current status

Biolink-model v3.5.3 mapping notes

- complex ID namespace is in biolink-model as `ComplexPortal` (in [yaml](https://github.com/biolink/biolink-model/blob/0c79ddca656280235667820ffb68d81abf649dc3/biolink-model.yaml#L41) and [prefix-map](https://github.com/biolink/biolink-model/blob/0c79ddca656280235667820ffb68d81abf649dc3/prefix-map/biolink-model-prefix-map.json#L36)) - complex's biolink-category is [MacromolecularComplex](https://github.com/biolink/biolink-model/blob/0c79ddca656280235667820ffb68d81abf649dc3/biolink-model.yaml#L7962C4-L7962C4), as pointed out by Rohan earlier - Using `part_of`/`has_part` predicates for Complex <-> Protein/Chemical relationships - Using `related_to` predicate for Complex <-> Disease relationships since the relationship between the complexes and the diseases aren't clear ("complex is linked to a specific disease condition" according to [data-documentation](https://www.ebi.ac.uk/complexportal/documentation))

To test this API locally, add it to your local config file

It's best to test this way, since then we can include the `primarySource: true` info and the TRAPI edge-sources info will display as-intended. In your local copy of https://github.com/biothings/bte-server/blob/main/src/config/apis.js, add the following item to the `include` list (I add it after the `CTD API` entry): ``` { id: "326eb1e437303bee27d3cef29227125d", name: "Complex Portal Web Service", primarySource: true }, ``` Then update your local copy of BTE's smartapi specs (`pnpm build`, then `pnpm run smartapi_sync`). Then you can send a POST request to the api-specific endpoint, Complex Portal only. Like http://localhost:3000/v1/smartapi/326eb1e437303bee27d3cef29227125d/query Put this in the request body: It's querying with the protein `hemoglobin subunit alpha (human)` ``` { "message": { "query_graph": { "edges": { "e01": { "subject": "n0", "object": "n1" } }, "nodes": { "n0": { "ids": ["UniProtKB:P69905"], "categories": ["biolink:Protein"] }, "n1": { "categories": ["biolink:MacromolecularComplex"] } } } } } ``` You'll get a response with this node and edge ``` "ComplexPortal:CPX-2158": { "categories": [ "biolink:MacromolecularComplex" ], "name": "Hemoglobin HbA complex", "attributes": [ { "attribute_type_id": "biolink:xref", "value": [ "ComplexPortal:CPX-2158" ] }, { "attribute_type_id": "biolink:synonym", "value": [ "ComplexPortal:CPX-2158" ] } ] } ``` ``` "6e42ea498eace1f667853945cee0b3ef": { "predicate": "biolink:part_of", "subject": "UniProtKB:P69905", "object": "ComplexPortal:CPX-2158", "attributes": [], "sources": [ { "resource_id": "infores:complex-portal", "resource_role": "primary_knowledge_source" }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:complex-portal" ] } ] } } ```

colleenXu commented 8 months ago

Discussed

I think this resource/yaml is ready to incorporate into BTE. However, I'm waiting for the decision on infores catalog changes (whether changes made now can be used in this release cycle).

If we do want to incorporate this data-resource during this release cycle:


Added

I added operations for GO biological process -> Complex and GO molecular function -> Complex. But I didn't add the opposite operations (Complex -> GO terms) because it would require custom JQ-processing.

Here's my notes on the data, with example API queries

This info may be specific to complex and not its parts: according to the [data-documentation](https://www.ebi.ac.uk/complexportal/documentation), "Annotation to [Gene Ontology](http://geneontology.org/) terms indicates the function, process, location and component of the complex as a whole" Complex -> GO terms: using the [same example](https://www.ebi.ac.uk/intact/complex-ws/complex/CPX-2158) as above, in the `crossReferences` field: * the GO terms are the objects where `database` = `gene ontology` * GO biological process terms are when `qualifier` = `biological process` * GO molecular function terms are when `qualifier` = `molecular function` * I'm not interested in the `cellular component` entries because it seems the same as the Complex entity... GO terms -> Complex: [example](https://www.ebi.ac.uk/intact/complex-ws/search/GO:0016491) from the [API documentation for /search/ endpoint](https://www.ebi.ac.uk/intact/complex-ws/search/) * structure of the response looks the same as the other /search/ queries, wouldn't need JQ post-processing


Not done yet (for another issue?)

All operations starting from Complex ID require JQ

All operations starting from the Complex ID depend on custom JQ-post-processing, which we need to add to BTE. Jackson @tokebe and I agreed to leave this for later * Rohan @rjawesome wrote operations w/ JQ strings for Complex -> Chemical/Disease/Protein in the yaml, which I commented out and haven't tested * The [JQ-in-smartapi PR](https://github.com/biothings/smartapi-kg.js/pull/61) is likely out-of-sync with the current code, after we put that [feature](https://github.com/biothings/biothings_explorer/issues/521) on-hold * Nothing is written for Complex -> GO biological-process or GO molecular-function yet

colleenXu commented 8 months ago

The infores stuff is being deployed for this release cycle!

So we are incorporating this resource into BTE/Service Provider during this release cycle. https://github.com/biothings/bte-server/pull/9

We are using an override as well, because this resource uses ORPHANET IDs and we're in the ORPHANET -> orphanet transition.


I think we can close this issue once:

We'll then have a separate process to remove the overrides (not needed once the yaml PRs are all merged / registrations refreshed).

colleenXu commented 8 months ago

@tokebe

I double-checked and it's not working on CI, probably because of the larger cache-update issues (recent lab Slack convo)

My test

POST to `https://bte.ci.transltr.io/v1/smartapi/326eb1e437303bee27d3cef29227125d/query` (from this [testExample](https://github.com/NCATS-Tangerine/translator-api-registry/blob/77ae9e9dbab7411c4044459d026e8f84cdbbcd3b/complexportal/smartapi.yaml#L91)) ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:Protein"], "ids": ["UniProtKB:P69905"] }, "n1": { "categories": ["biolink:MacromolecularComplex"] } }, "edges": { "e01": { "subject": "n0", "object": "n1", "predicates": ["biolink:part_of"] } } } } } ``` Right now, BTE CI doesn't recognize this SmartAPI registration ID, which shows that the smartapi-spec cron job didn't run successfully. Note: I should also be able to get a response through `https://bte.ci.transltr.io/v1/team/Service Provider/query`. But right now, no matching MetaEdges are found.

tokebe commented 8 months ago

Issue should now be addressed by https://github.com/biothings/biothings_explorer/commit/3019cecf670e5b0fc04877c31956b2bbbc3d7e4e, please test again

colleenXu commented 8 months ago

Now it's working on BTE CI! Yay!

colleenXu commented 6 months ago

Closing this issue since the changes have been deployed to Prod with the Feb 2024 release.

I've confirmed that I can query ComplexPortal through BTE prod https://bte.transltr.io/v1/team/Service Provider/query with the example in https://github.com/biothings/biothings_explorer/issues/631#issuecomment-1853408456 and get the expected response.