biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

x-bte annotation refactoring discussion #656

Open colleenXu opened 1 year ago

colleenXu commented 1 year ago

Going to open this issue as a place-holder for now. Look in comments for discussion on what is involved here

colleenXu commented 1 year ago

Idea from @tokebe:

Ideas from me:

colleenXu commented 1 year ago

Another idea, although may be more "modifying / adding operations": using list_filter for BioThings APIs more, although that doesn't fully fix reverse-operation issues #316

colleenXu commented 1 year ago

From a convo Jackson and me had today:

Some main things to address?

Note:

colleenXu commented 1 year ago

@newgene This is the issue on x-bte refactoring. I heard during today's group meeting that there's some issues with MetaKG stuff and id-prefixes / operation-level source.

However, @tokebe and I have discussed the way x-bte annotations are written may be set up 1 way to be more writer-friendly, and be parsed into a different representation for BTE's MetaKG.....because there are diff requirements for both or design ideas.

I wonder if this is the case for your Translator/SmartAPI-Registry's MetaKG work....I suspect this work has a different set of requirements from BTE's internal representation AND the x-bte annotation writing...

colleenXu commented 1 year ago

One case of "multiple prefixes in output" is https://github.com/biothings/biothings_explorer/issues/585

colleenXu commented 11 months ago

Jackson @tokebe and I think this is the overarching topic for x-bte refactoring (after reviewing the notes above):

The issues

There seems to be 3 different requirement sets at play, that we want to tell apart and be aware of:

Which leads to specific questions for group discussion, like:


And some ideas on how to "expand" an x-bte operation/ unit of annotation

Currently, 1 x-bte operation represents...

* 1 API endpoint being used * 1 unique combo of: * input semantic-type * input ID namespace * sub-query information * predicate * qualifier-set * source field value * output semantic-type * output ID namespace

Jackson @tokebe and I have discussed how to make it easier to write x-bte annotation - and one of our ideas is to have 1 x-bte operation (one unit of annotation?) expand to include more info:

my qualifier-set thinking

There are theoretically many operations that would mainly differ by qualifier-set (and how that affects sub-query info like post_filter/filter, jmespath, JQ). The guidance for [anatomical](https://github.com/biolink/biolink-model/blob/db44be0c49939229c28cbb71a715127941e0ce0b/biolink-model.yaml#L1515) / [species](https://github.com/biolink/biolink-model/blob/db44be0c49939229c28cbb71a715127941e0ce0b/biolink-model.yaml#L1532) / and [population](https://github.com/biolink/biolink-model/blob/db44be0c49939229c28cbb71a715127941e0ce0b/biolink-model.yaml#L1158) context qualifiers is currently unclear to me (are they edge-attributes or part of the qualifier-set?). If they turn out to be part of the qualifier-set and we want to suppor them, this has combinatorial explosion problems because the context qualifiers in our KPs have a lot of possible values. * anatomical context: * multiomics apis (drug response): Guangrong has previously told me that some operations are affected, and include 10-20s of possible tissue/anatomical-context values * also in pending apis: ebi gene2pheno * species context: affects lots of apis * core biothings: MyChem chembl.drug_mechanism and drugcentral.bioactivity info, MyGene panther, a little MyDisease disgenet) * pending biothings: bindingdb, mgi gene 2pheno * external: ctd, biolink/monarch * population context: * multiomics apis based on clinical data: ehr risk, wellness (clinical trials too?)

My source field thinking

There are theoretically some operations that would mainly differ by source (and how that affects sub-query info like post_filter/filter, jmespath, JQ...). It would be nice if we could set the source info to field values that are post-processed by BTE... I'm not sure of the scope of this issue though: * core biothings apis: mygene, mydisease disgenet * external apis: biolink/monarch Also maybe complicated because some api hits will have multiple source values / fields?