x-bte refactoring: what is 1 x-bte operation (unit of annotation)?

The issues

There seems to be 3 different requirement sets at play, that we want to tell apart and be aware of:

"writer-friendly x-bte annotation":
- easy to write/teach/maintain, can write manually (without using code or UI)
- shouldn't be completely like code
- has clear expectations for format / allowed values / what everything is used for
- flexible, expressive
- not dependent on specific TRAPI / biolink-model stuff that's still in-flux
"internal BTE use": what BTE needs to keep track of all the info, construct sub-queries, edge management, etc. (vocab: BTE MetaEdge, MetaXEdge, bteEdge...)
- x-bte annotation may be too "collapsed" from this POV, and BTE will need to expand 1 operation -> multiple internal representations
"SmartAPI Registry MetaKG use": https://smart-api.info/portal/translator/metakg. What's needed for this tool / UI
- x-bte annotation may be too verbose/specific from this POV, and this'll need to collapse multiple operations -> 1 MetaEdge for its purpose

Which leads to specific questions for group discussion, like:

How does "1 x-bte operation / unit of annotation" relate to similar concepts (MetaEdges?) in BTE and SmartAPI Registry MetaKG?
- and how does x-bte refactoring relate to and potentially change this?
are BTE and SmartAPI Registry MetaKG using the same code? Does that make sense or should they use different code to process x-bte annotations?

And some ideas on how to "expand" an x-bte operation/ unit of annotation

Currently, 1 x-bte operation represents...

* 1 API endpoint being used * 1 unique combo of: * input semantic-type * input ID namespace * sub-query information * predicate * qualifier-set * source field value * output semantic-type * output ID namespace

Jackson @tokebe and I have discussed how to make it easier to write x-bte annotation - and one of our ideas is to have 1 x-bte operation (one unit of annotation?) expand to include more info:

first-step proposal is #748
- since there can be "combinatorial explosions" of current operations where the main difference comes from the input/output ID namespaces
Other sources of "combinatorial explosions" are:
- unique qualifier-sets
- unique source field values
note that all of these aren't as easy as "list out the possible values". There can be sub-query info, response-mapping info, post-processing info differences based on unique value/set...

my qualifier-set thinking

There are theoretically many operations that would mainly differ by qualifier-set (and how that affects sub-query info like post_filter/filter, jmespath, JQ). The guidance for [anatomical](https://github.com/biolink/biolink-model/blob/db44be0c49939229c28cbb71a715127941e0ce0b/biolink-model.yaml#L1515) / [species](https://github.com/biolink/biolink-model/blob/db44be0c49939229c28cbb71a715127941e0ce0b/biolink-model.yaml#L1532) / and [population](https://github.com/biolink/biolink-model/blob/db44be0c49939229c28cbb71a715127941e0ce0b/biolink-model.yaml#L1158) context qualifiers is currently unclear to me (are they edge-attributes or part of the qualifier-set?). If they turn out to be part of the qualifier-set and we want to suppor them, this has combinatorial explosion problems because the context qualifiers in our KPs have a lot of possible values. * anatomical context: * multiomics apis (drug response): Guangrong has previously told me that some operations are affected, and include 10-20s of possible tissue/anatomical-context values * also in pending apis: ebi gene2pheno * species context: affects lots of apis * core biothings: MyChem chembl.drug_mechanism and drugcentral.bioactivity info, MyGene panther, a little MyDisease disgenet) * pending biothings: bindingdb, mgi gene 2pheno * external: ctd, biolink/monarch * population context: * multiomics apis based on clinical data: ehr risk, wellness (clinical trials too?)

My source field thinking

There are theoretically some operations that would mainly differ by source (and how that affects sub-query info like post_filter/filter, jmespath, JQ...). It would be nice if we could set the source info to field values that are post-processed by BTE... I'm not sure of the scope of this issue though: * core biothings apis: mygene, mydisease disgenet * external apis: biolink/monarch Also maybe complicated because some api hits will have multiple source values / fields?

(ref for this issue: previous discussion notes in https://github.com/biothings/biothings_explorer/issues/656)

biothings / biothings_explorer

x-bte refactoring: what is 1 x-bte operation (unit of annotation)? #752

The issues

And some ideas on how to "expand" an x-bte operation/ unit of annotation