biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

Pathfinder: templates appear to not be filled out properly #893

Open colleenXu opened 1 month ago

colleenXu commented 1 month ago

What's happening

(1) The filled-out templates written in the TRAPI/console logs look strange. They're missing category/predicate assignments. Is that okay? (these logs come from https://github.com/biothings/biothings_explorer/issues/862)

From running example 5: affects all 3 filled-out templates

``` { "timestamp": "2024-10-24T07:38:05.950Z", "level": "INFO", "message": "Got 3 pathfinder query templates.", "code": null }, { "timestamp": "2024-10-24T07:38:05.950Z", "level": "INFO", "message": "Pathfinder Template A: (NCBIGene:3458) -()-> (Gene) -(regulates,regulated_by,affects,affected_by,interacts_with,occurs_together_in_literature_with,produces)-> (CHEBI:16828)", "code": null }, { "timestamp": "2024-10-24T07:38:05.950Z", "level": "INFO", "message": "Pathfinder Template B: (NCBIGene:3458) -()-> (Gene) -()-> () -()-> (CHEBI:16828)", "code": null }, { "timestamp": "2024-10-24T07:38:05.950Z", "level": "INFO", "message": "Pathfinder Template C: (NCBIGene:3458) -()-> () -()-> (Gene) -()-> (CHEBI:16828)", "code": null }, ```

From running example 3: affects only 3rd filled-out template ("C")

``` { "timestamp": "2024-10-24T07:51:24.909Z", "level": "INFO", "message": "Got 3 pathfinder query templates.", "code": null }, { "timestamp": "2024-10-24T07:51:24.909Z", "level": "INFO", "message": "Pathfinder Template A: (MONDO:0005011) -(condition_associated_with_gene,has_biomarker,affected_by,contribution_from)-> (SequenceVariant,Gene) -(gene_associated_with_condition,biomarker_for,affects,contributes_to)-> (MONDO:0005180)", "code": null }, { "timestamp": "2024-10-24T07:51:24.909Z", "level": "INFO", "message": "Pathfinder Template B: (MONDO:0005011) -(condition_associated_with_gene,has_biomarker,affected_by,contribution_from)-> (SequenceVariant,Gene) -(affects,produced_by,located_in,part_of,interacts_with)-> (Cell) -(location_of,affected_by,interacts_with)-> (MONDO:0005180)", "code": null }, { "timestamp": "2024-10-24T07:51:24.909Z", "level": "INFO", "message": "Pathfinder Template C: (MONDO:0005011) -()-> () -()-> (SequenceVariant,Gene) -()-> (MONDO:0005180)", "code": null }, ```

(2) Some of those filled-out templates terminate early because no matching MetaKG edges were found.

Logs

Example 5: * ` 2024-10-24T07:38:11.514Z WARNING: [Template-2]: Query Edges [(reversed b_obj), (reversed un_b)] have no MetaKG edges. Your query terminates.` * `2024-10-24T07:38:10.841Z WARNING: [Template-3]: Query Edges [(sub_c), (c_un)] have no MetaKG edges. Your query terminates.` Example 3: * `2024-10-24T07:51:30.533Z WARNING: [Template-3]: Query Edges [(sub_c), (c_un)] have no MetaKG edges. Your query terminates.`

(3) The templates that terminate early have odd console logs, where the input_type or output_type of a QEdge to execute (qXEdge? APIEdge?) is an empty array.

From Example 5, 2nd template

``` bte:biothings-explorer-trapi:edge-manager (3) Edge manager is managing 3 qEdges. +6ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 3 +0ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'b_obj' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (1)--(0) +0ms bte:biothings-explorer-trapi:edge-manager (5) Checking direction of edge with one set of entities... +0ms bte:biothings-explorer-trapi:edge-manager 'sub_un' : (1) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'un_b' : (0) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'b_obj' : (0) <-- (1) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is creativeQueryObject +139ms bte:biothings-explorer-trapi:qedge2btedge Output node is nb +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "SmallMolecule", bte:biothings-explorer-trapi:qedge2btedge "MolecularEntity", bte:biothings-explorer-trapi:qedge2btedge "NucleicAcidEntity", bte:biothings-explorer-trapi:qedge2btedge "ChemicalMixture", bte:biothings-explorer-trapi:qedge2btedge "MolecularMixture", bte:biothings-explorer-trapi:qedge2btedge "Drug", bte:biothings-explorer-trapi:qedge2btedge "ComplexMolecularMixture", bte:biothings-explorer-trapi:qedge2btedge "ProcessedMaterial", bte:biothings-explorer-trapi:qedge2btedge "Food", bte:biothings-explorer-trapi:qedge2btedge "EnvironmentalFoodContaminant", bte:biothings-explorer-trapi:qedge2btedge "FoodAdditive", bte:biothings-explorer-trapi:qedge2btedge "ChemicalEntity" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:qedge2btedge No smartapi edge found for b_obj +136ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 2 +136ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'un_b' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (0)--(0) +0ms bte:biothings-explorer-trapi:edge-manager (5) Checking direction of edge with one set of entities... +0ms bte:biothings-explorer-trapi:edge-manager 'sub_un' : (1) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'un_b' : (0) <-- (1) +0ms bte:biothings-explorer-trapi:edge-manager 'b_obj' : (1) <-- (1) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is nb +0ms bte:biothings-explorer-trapi:qedge2btedge Output node is un +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:qedge2btedge No smartapi edge found for un_b +130ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 1 +130ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'sub_un' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (0)--(1) +0ms bte:biothings-explorer-trapi:QEdge (8) Choosing lower entity count in edge... +266ms bte:biothings-explorer-trapi:QNode (8) Node "un" holding undefined aside. +266ms bte:biothings-explorer-trapi:QEdge (8) Sub - Obj were same but chose subject (1) +0ms bte:biothings-explorer-trapi:edge-manager 'sub_un' : (1) --> (1) +0ms bte:biothings-explorer-trapi:edge-manager 'un_b' : (1) <-- (1) +0ms bte:biothings-explorer-trapi:edge-manager 'b_obj' : (1) <-- (1) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is creativeQuerySubject +0ms bte:biothings-explorer-trapi:qedge2btedge Output node is un +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:main Query Edges [(reversed b_obj), (reversed un_b)] have no MetaKG edges. Your query terminates. +399ms ```

From Example 5, 3rd template

``` bte:biothings-explorer-trapi:edge-manager (3) Edge manager is managing 3 qEdges. +0ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 3 +0ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'un_obj' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (1)--(0) +0ms bte:biothings-explorer-trapi:edge-manager (5) Checking direction of edge with one set of entities... +0ms bte:biothings-explorer-trapi:edge-manager 'sub_c' : (1) --> (0) +1ms bte:biothings-explorer-trapi:edge-manager 'c_un' : (0) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'un_obj' : (0) <-- (1) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is creativeQueryObject +0ms bte:biothings-explorer-trapi:qedge2btedge Output node is un +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "SmallMolecule", bte:biothings-explorer-trapi:qedge2btedge "MolecularEntity", bte:biothings-explorer-trapi:qedge2btedge "NucleicAcidEntity", bte:biothings-explorer-trapi:qedge2btedge "ChemicalMixture", bte:biothings-explorer-trapi:qedge2btedge "MolecularMixture", bte:biothings-explorer-trapi:qedge2btedge "Drug", bte:biothings-explorer-trapi:qedge2btedge "ComplexMolecularMixture", bte:biothings-explorer-trapi:qedge2btedge "ProcessedMaterial", bte:biothings-explorer-trapi:qedge2btedge "Food", bte:biothings-explorer-trapi:qedge2btedge "EnvironmentalFoodContaminant", bte:biothings-explorer-trapi:qedge2btedge "FoodAdditive", bte:biothings-explorer-trapi:qedge2btedge "ChemicalEntity" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 2 +159ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'sub_c' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (0)--(1) +0ms bte:biothings-explorer-trapi:edge-manager 'sub_c' : (1) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'c_un' : (0) --> (322) +0ms bte:biothings-explorer-trapi:edge-manager 'un_obj' : (322) <-- (1) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is creativeQuerySubject +159ms bte:biothings-explorer-trapi:qedge2btedge Output node is nc +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [] bte:biothings-explorer-trapi:qedge2btedge } +1ms bte:biothings-explorer-trapi:qedge2btedge No smartapi edge found for sub_c +142ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 1 +143ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'c_un' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (0)--(0) +0ms bte:biothings-explorer-trapi:QEdge (8) Choosing lower entity count in edge... +304ms bte:biothings-explorer-trapi:QNode (8) Node "un" holding undefined aside. +304ms bte:biothings-explorer-trapi:QEdge (8) Chose lower entity value in subject (1) +0ms bte:biothings-explorer-trapi:edge-manager 'sub_c' : (1) --> (1) +0ms bte:biothings-explorer-trapi:edge-manager 'c_un' : (1) --> (322) +0ms bte:biothings-explorer-trapi:edge-manager 'un_obj' : (322) <-- (1) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is nc +0ms bte:biothings-explorer-trapi:qedge2btedge Output node is un +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:qedge2btedge No smartapi edge found for c_un +130ms bte:biothings-explorer-trapi:main Query Edges [(sub_c), (c_un)] have no MetaKG edges. Your query terminates. +434ms ```

From Example 3, 3rd template

``` bte:biothings-explorer-trapi:edge-manager (3) Edge manager is managing 3 qEdges. +6ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 3 +0ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'sub_c' WITH entity count...(11) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (0)--(11) +0ms bte:biothings-explorer-trapi:edge-manager 'sub_c' : (11) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'c_un' : (0) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'un_obj' : (0) <-- (28) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is creativeQuerySubject +131ms bte:biothings-explorer-trapi:qedge2btedge Output node is nc +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "Disease", bte:biothings-explorer-trapi:qedge2btedge "PhenotypicFeature", bte:biothings-explorer-trapi:qedge2btedge "BehavioralFeature", bte:biothings-explorer-trapi:qedge2btedge "ClinicalFinding", bte:biothings-explorer-trapi:qedge2btedge "DiseaseOrPhenotypicFeature" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:qedge2btedge No smartapi edge found for sub_c +131ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 2 +131ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'c_un' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (0)--(0) +0ms bte:biothings-explorer-trapi:edge-manager 'sub_c' : (1) --> (1) +0ms bte:biothings-explorer-trapi:edge-manager 'c_un' : (1) --> (0) +0ms bte:biothings-explorer-trapi:edge-manager 'un_obj' : (0) <-- (28) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is nc +0ms bte:biothings-explorer-trapi:qedge2btedge Output node is un +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Snv", bte:biothings-explorer-trapi:qedge2btedge "SequenceVariant", bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:qedge2btedge No smartapi edge found for c_un +125ms bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 1 +125ms bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'un_obj' WITH entity count...(1) +0ms bte:biothings-explorer-trapi:edge-manager Checking entity max : (28)--(0) +1ms bte:biothings-explorer-trapi:QEdge (8) Choosing lower entity count in edge... +258ms bte:biothings-explorer-trapi:QNode (8) Node "creativeQueryObject" holding ["MONDO:0005180","MONDO:0008199","MONDO:0009830","MONDO:0010796","MONDO:0013167","MONDO:0017279","MONDO:0957576","MONDO:0008200","MONDO:0011562","MONDO:0011764","MONDO:0013060","MONDO:0013625","MONDO:0014604","MONDO:0014742","MONDO:0008193","MONDO:0000828","MONDO:0010360","MONDO:0010820","MONDO:0011220","MONDO:0011613","MONDO:0011658","MONDO:0011737","MONDO:0014233","MONDO:0014796","MONDO:0011706","MONDO:0014231","MONDO:0800369","MONDO:0017809"] aside. +258ms bte:biothings-explorer-trapi:QEdge (8) Chose lower entity value in subject (1) +0ms bte:biothings-explorer-trapi:edge-manager 'sub_c' : (1) --> (1) +0ms bte:biothings-explorer-trapi:edge-manager 'c_un' : (1) --> (1) +0ms bte:biothings-explorer-trapi:edge-manager 'un_obj' : (1) --> (28) +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is un +1ms bte:biothings-explorer-trapi:qedge2btedge Output node is creativeQueryObject +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "Snv", bte:biothings-explorer-trapi:qedge2btedge "SequenceVariant", bte:biothings-explorer-trapi:qedge2btedge "Gene", bte:biothings-explorer-trapi:qedge2btedge "ProteinIsoform", bte:biothings-explorer-trapi:qedge2btedge "Protein" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Disease", bte:biothings-explorer-trapi:qedge2btedge "PhenotypicFeature", bte:biothings-explorer-trapi:qedge2btedge "BehavioralFeature", bte:biothings-explorer-trapi:qedge2btedge "ClinicalFinding", bte:biothings-explorer-trapi:qedge2btedge "DiseaseOrPhenotypicFeature" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:main Query Edges [(sub_c), (c_un)] have no MetaKG edges. Your query terminates. +383ms ```

Data

Commentary

This is likely happening in CI + Test, since I found examples of it happening Friday 10/18 and it's happening in my local (all main branches) right now.

I'm not sure if this is "expected" behavior or not. I lean towards unexpected, because I think the category/predicate mappings do exist but aren't being used in some templates. Some I noticed:

rjawesome commented 1 month ago

@colleenXu In Example 5 (Template A):

In Example 5 (Template B)

In Example 5 (Template C)

In Example 3 (Template C):

Maybe some sort of gene/protein conflation needs to be implemented in Pathfinder? Or should there be different behavior when no categories are found in the category table?

tokebe commented 4 weeks ago

Odd, I was under the impression that Gene/Protein conflation would happen prior to pathfinder handling, so Pathfinder template generation would have both to work with from the get go...

If that's not the case, we'll want an extra Gene/Protein conflation pass prior to template building in Pathfinder (Or, alternatively, Gene/Protein expansion when generating the in-memory table from the hard table)

If no categories are found, we'll want NamedThing, though again I was expecting that the internal query execution of a template would handle injecting that...

colleenXu commented 4 weeks ago

From my POV, there's several things to investigate/untangle:


Helpful to look back at slide deck and info in https://github.com/biothings/biothings_explorer/issues/832

colleenXu commented 4 weeks ago

Andrew told me that we don't have to rush to fix this. It's fine to not get this done by the end of this phase.