biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

Pathfinder: Rework templating system #832

Closed tokebe closed 7 hours ago

tokebe commented 4 months ago

Parent issue: #794

In order to better cover the space of possible pathfinder queries (in terms of pinned-node pairings), we need to revisit how Pathfinder acquires templates. Instead of manually-built TRAPI templates, we should instead generate templates on-the-fly based on the node categories presented in the pathfinder query.

Essentially, within the existing framework of the current prototype, we replace the "select templates" step as follows:

Assume two lookup tables:

These tables would be defined as yaml files (categoryTable.yaml and predicateTable.yaml) in the query_graph_handler data folder, with the following structure:

category:
  category:
    - category/predicate
    - category/predicate
   category:
     - category/predicate
     - category/predicate
category:
  category:
    - category/predicate
# and so on

These tables would then be used to generate templates as specified in these slides

At which point behavior returns to the current implementation of the prototype, with one exception:

When generating final results from the existing template results (see parent issue, current prototype), skip constructing a result for any intermediate node which is not equal to or a descendant of the unpinned node category (as a computational shortcut, if the unpinned node category is biolink:NamedThing, then no need to check, just proceed with making the result).

tokebe commented 4 months ago

Work on this will first require some example tables for testing, assigning @colleenXu to build these:

colleenXu commented 4 months ago

Example tables are now in pathfinder-newsystem branch's data folder!

Note: I made these tables thinking of how to construct the existing Pathfinder MVP templates using templates 1 + 2, and not really 3.

Some things to account for @rjawesome @tokebe:

rjawesome commented 4 months ago

A few questions about the slides 1. image Does this mean the edge should have no predicates (ie. "predicates": []) which would automatically make the query fail, or does this mean the edge should not have the predicates key (ie. basically have a predicate of "biolink:related_to")?

2 . If two adjacent nodes (ie. n0 & A) have multiple categories, then should all predicates from all category pairs be merged, or should a new query graph be generated for each category pair (with the predicates in a particular query graph determined by its category pair)?

3 (similar to 2). If n0 & A have multiple categories, then should all possible categories for node B be merged, or should a new query graph be generated for each category pair of n0 & A (with the categories for B in a particular query graph determined by its category pair)?

tokebe commented 4 months ago
  1. The latter, just remove the predicate key (ignore my earlier edit, I misread your question).
  2. Initially, we were thinking it would just be merged, so one query graph where an edge might have multiple predicates. I don't think this should cause problems with result transformation down the line -- if it does, we should review and possibly end up making separate query graphs as you say.
  3. Merged. If it causes problems, as with 2, we'll review before deciding what to do instead.
colleenXu commented 4 months ago

@rjawesome

After today's discussion with Jackson, we decided to drop "potential qualifier support" for now (it would maybe involve separate QGraph for a predicate+qualifier-set combo).

So I pushed a commit to adjust the predicateTable.yaml format, making it simpler. Can you pull and adjust your code to account for this format change?

rjawesome commented 4 months ago

Done

colleenXu commented 2 months ago

Updates

(1) Lookup tables were adjusted right before Sept Translator Relay https://github.com/biothings/bte_trapi_query_graph_handler/pull/215.

(2) Hotfixes (direct commits) made during Sept Translator Relay to fix TRAPI validation errors for Pathfinder responses

colleenXu commented 2 months ago

Current discussion topics

  1. Rerun examples/reanalyze now that MyDisease disgenet data has been fixed?
    • During the testing/lookup-table adjustments, I discovered that MyDisease was missing disgenet data. This was fixed yesterday (Slack link).
    • I know that affected the imatinib-asthma example (although the use case may actually be using the allergic asthma ID only...)
  2. Currently testing/deploying https://github.com/biothings/biothings_explorer/issues/862
  3. May need some additions to lookup-tables for Cell starting-category support https://github.com/biothings/biothings_explorer/issues/864
  4. Pathfinder scoring isn't meaningful #863
tokebe commented 7 hours ago

Related changes deployed to Prod as of 11/13