RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

ARAX finds answers for inferred treats query, but returns 0 results #2397

Closed amykglen closed 1 month ago

amykglen commented 1 month ago

https://arax.ncats.io/beta/?r=309801

this query I just ran on /beta (an 'Example 2' query for disease MONDO:0016262) finds answers, both from XDTD and from KG2, but returns no results.

the logs show plenty of answers in the KG after expand, but by the time resultify is doing its work, they're gone:

2024-10-08T00:40:18.131552 INFO:    After Expand, the KG has 242 nodes and 968 edges (creative_DTD_qedge_0: 566, creative_DTD_qedge_1: 350, creative_DTD_qedge_2: 16, creative_DTD_qnode_0: 199, creative_DTD_qnode_1: 17, on: 1, sn: 38, t_edge: 36)12.91 s
   2024-10-08T00:40:18.133350 INFO:    Processing action 'overlay' with parameters {'action': 'compute_ngd', 'virtual_relation_label': 'N1', 'subject_qnode_key': 'on', 'object_qnode_key': 'sn'}12.912 s
   2024-10-08T00:40:18.133432 DEBUG:    Applying Overlay to Message with parameters {'action': 'compute_ngd', 'virtual_relation_label': 'N1', 'subject_qnode_key': 'on', 'object_qnode_key': 'sn'}12.912 s
   2024-10-08T00:40:18.143580 DEBUG:    Computing NGD12.922 s
   2024-10-08T00:40:18.143622 INFO:    Computing the normalized Google distance: weighting edges based on subject/object node co-occurrence frequency in PubMed abstracts12.922 s
   2024-10-08T00:40:18.143654 DEBUG:    Narrowing down on--sn node pairs to overlay12.922 s
   2024-10-08T00:40:18.199889 DEBUG:    Identified 36 node pairs to overlay (with help of resultify)12.978 s
   2024-10-08T00:40:18.200880 DEBUG:    Canonicalizing curies of relevant nodes using NodeSynonymizer12.979 s
   2024-10-08T00:40:18.237983 DEBUG:    Extracting PMID lists from sqlite database for relevant nodes13.016 s
   2024-10-08T00:40:18.287771 DEBUG:    Looping through 36 node pairs and calculating NGD values13.066 s
   2024-10-08T00:40:18.395513 INFO:    NGD values successfully added to edges13.174 s
   2024-10-08T00:40:18.395856 DEBUG:    Decorating edges with EPC info from KG2c13.174 s
   2024-10-08T00:40:18.396486 DEBUG:    Identified 36 edges to decorate13.175 s
   2024-10-08T00:40:18.396995 DEBUG:    Looking up EPC edge info in KG2c sqlite to decorate NGD edges13.175 s
   2024-10-08T00:40:18.398717 DEBUG:    Got 0 rows back from KG2c sqlite13.177 s
   2024-10-08T00:40:18.398759 DEBUG:    Adding attributes to NGD edges in the KG13.177 s
   2024-10-08T00:40:18.403050 DEBUG:    Query graph is {'edges': {'N1': {'attribute_constraints': [], 'exclude': None, 'knowledge_type': None, 'object': 'sn', 'option_group_id': None, 'predicates': ['biolink:occurs_together_in_literature_with'], 'qualifier_constraints': [], 'subject': 'on'}, 'creative_DTD_qedge_0': {'attribute_constraints': [], 'exclude': False, 'knowledge_type': None, 'object': 'creative_DTD_qnode_0', 'option_group_id': 'creative_DTD_option_group_0', 'predicates': None, 'qualifier_constraints': [], 'subject': 'sn'}, 'creative_DTD_qedge_1': {'attribute_constraints': [], 'exclude': False, 'knowledge_type': None, 'object': 'creative_DTD_qnode_1', 'option_group_id': 'creative_DTD_option_group_0', 'predicates': None, 'qualifier_constraints': [], 'subject': 'creative_DTD_qnode_0'}, 'creative_DTD_qedge_2': {'attribute_constraints': [], 'exclude': False, 'knowledge_type': None, 'object': 'on', 'option_group_id': 'creative_DTD_option_group_0', 'predicates': None, 'qualifier_constraints': [], 'subject': 'creative_DTD_qnode_1'}, 't_edge': {'attribute_constraints': [], 'exclude': None, 'knowledge_type': 'inferred', 'object': 'on', 'option_group_id': None, 'predicates': ['biolink:treats'], 'qualifier_constraints': [], 'subject': 'sn'}}, 'nodes': {'creative_DTD_qnode_0': {'categories': None, 'constraints': [], 'ids': None, 'is_set': True, 'option_group_id': 'creative_DTD_option_group_0', 'set_id': None, 'set_interpretation': 'BATCH'}, 'creative_DTD_qnode_1': {'categories': None, 'constraints': [], 'ids': None, 'is_set': True, 'option_group_id': 'creative_DTD_option_group_0', 'set_id': None, 'set_interpretation': 'BATCH'}, 'on': {'categories': ['biolink:Disease'], 'constraints': [], 'ids': ['MONDO:0016262'], 'is_set': False, 'option_group_id': None, 'set_id': None, 'set_interpretation': 'BATCH'}, 'sn': {'categories': ['biolink:Drug', 'biolink:SmallMolecule'], 'constraints': [], 'ids': None, 'is_set': False, 'option_group_id': None, 'set_id': None, 'set_interpretation': 'BATCH'}}}13.182 s
   2024-10-08T00:40:18.403146 DEBUG:    Number of nodes in KG is 24213.182 s
   2024-10-08T00:40:18.403504 DEBUG:    Number of nodes in KG by type is Counter({'biolink:SmallMolecule': 115, 'biolink:Gene': 101, 'biolink:Drug': 9, 'biolink:PhysiologicalProcess': 4, 'biolink:ChemicalEntity': 4, 'biolink:Protein': 3, 'biolink:Disease': 2, 'biolink:Cell': 2, 'biolink:CellularComponent': 1, 'biolink:MolecularActivity': 1})13.182 s
   2024-10-08T00:40:18.403540 DEBUG:    Number of edges in KG is 100413.182 s
   2024-10-08T00:40:18.404229 DEBUG:    Number of edges in KG by type is Counter({'biolink:affects': 393, 'biolink:interacts_with': 343, 'biolink:physically_interacts_with': 96, 'biolink:treats': 36, 'biolink:occurs_together_in_literature_with': 36, 'biolink:colocalizes_with': 26, 'biolink:directly_physically_interacts_with': 21, 'biolink:gene_associated_with_condition': 16, 'biolink:located_in': 10, 'biolink:disrupts': 9, 'biolink:causes': 5, 'biolink:has_participant': 5, 'biolink:has_part': 4, 'biolink:subclass_of': 2, 'biolink:occurs_in': 1, 'biolink:produces': 1})13.183 s
   2024-10-08T00:40:18.404621 DEBUG:    Number of edges in KG with attributes is 100413.183 s
   2024-10-08T00:40:18.408066 DEBUG:    Number of edges in KG by attribute Counter({None: 3882, 'defined_datetime': 1004, 'probability_treats': 36, 'normalized_google_distance': 36, 'virtual_relation_label': 36, 'publications': 5})13.187 s
   2024-10-08T00:40:18.408125 INFO:    Processing action 'filter_kg' with parameters {'action': 'remove_general_concept_nodes', 'perform_action': 'True'}13.187 s
   2024-10-08T00:40:18.414003 DEBUG:    Removing Nodes13.193 s
   2024-10-08T00:40:18.414045 INFO:    Removing nodes from the knowledge graph which are general concepts13.193 s
   2024-10-08T00:40:24.959812 INFO:    Removed 5 nodes from the knowledge graph which are general concepts19.738 s
   2024-10-08T00:40:24.961573 DEBUG:    Removing orphaned nodes19.74 s
   2024-10-08T00:40:24.961615 INFO:    Removing orphaned nodes19.74 s
   2024-10-08T00:40:24.962290 DEBUG:    Identified 0 orphan nodes to remove19.741 s
   2024-10-08T00:40:24.962322 INFO:    Nodes successfully removed19.741 s
   2024-10-08T00:40:24.962351 INFO:    Nodes successfully removed19.741 s
   2024-10-08T00:40:24.962437 DEBUG:    Applying Overlay to Message with parameters {'action': 'remove_general_concept_nodes', 'perform_action': True}19.741 s
   2024-10-08T00:40:24.964448 DEBUG:    Query graph is {'edges': {'N1': {'attribute_constraints': [], 'exclude': None, 'knowledge_type': None, 'object': 'sn', 'option_group_id': None, 'predicates': ['biolink:occurs_together_in_literature_with'], 'qualifier_constraints': [], 'subject': 'on'}, 'creative_DTD_qedge_0': {'attribute_constraints': [], 'exclude': False, 'knowledge_type': None, 'object': 'creative_DTD_qnode_0', 'option_group_id': 'creative_DTD_option_group_0', 'predicates': None, 'qualifier_constraints': [], 'subject': 'sn'}, 'creative_DTD_qedge_1': {'attribute_constraints': [], 'exclude': False, 'knowledge_type': None, 'object': 'creative_DTD_qnode_1', 'option_group_id': 'creative_DTD_option_group_0', 'predicates': None, 'qualifier_constraints': [], 'subject': 'creative_DTD_qnode_0'}, 'creative_DTD_qedge_2': {'attribute_constraints': [], 'exclude': False, 'knowledge_type': None, 'object': 'on', 'option_group_id': 'creative_DTD_option_group_0', 'predicates': None, 'qualifier_constraints': [], 'subject': 'creative_DTD_qnode_1'}, 't_edge': {'attribute_constraints': [], 'exclude': None, 'knowledge_type': 'inferred', 'object': 'on', 'option_group_id': None, 'predicates': ['biolink:treats'], 'qualifier_constraints': [], 'subject': 'sn'}}, 'nodes': {'creative_DTD_qnode_0': {'categories': None, 'constraints': [], 'ids': None, 'is_set': True, 'option_group_id': 'creative_DTD_option_group_0', 'set_id': None, 'set_interpretation': 'BATCH'}, 'creative_DTD_qnode_1': {'categories': None, 'constraints': [], 'ids': None, 'is_set': True, 'option_group_id': 'creative_DTD_option_group_0', 'set_id': None, 'set_interpretation': 'BATCH'}, 'on': {'categories': ['biolink:Disease'], 'constraints': [], 'ids': ['MONDO:0016262'], 'is_set': False, 'option_group_id': None, 'set_id': None, 'set_interpretation': 'BATCH'}, 'sn': {'categories': ['biolink:Drug', 'biolink:SmallMolecule'], 'constraints': [], 'ids': None, 'is_set': False, 'option_group_id': None, 'set_id': None, 'set_interpretation': 'BATCH'}}}19.743 s
   2024-10-08T00:40:24.964510 DEBUG:    Number of nodes in KG is 23719.743 s
   2024-10-08T00:40:24.964739 DEBUG:    Number of nodes in KG by type is Counter({'biolink:SmallMolecule': 113, 'biolink:Gene': 100, 'biolink:Drug': 9, 'biolink:PhysiologicalProcess': 4, 'biolink:ChemicalEntity': 4, 'biolink:Protein': 2, 'biolink:Cell': 2, 'biolink:Disease': 1, 'biolink:CellularComponent': 1, 'biolink:MolecularActivity': 1})19.743 s
   2024-10-08T00:40:24.964773 DEBUG:    Number of edges in KG is 89019.743 s
   2024-10-08T00:40:24.965196 DEBUG:    Number of edges in KG by type is Counter({'biolink:affects': 380, 'biolink:interacts_with': 331, 'biolink:physically_interacts_with': 95, 'biolink:colocalizes_with': 26, 'biolink:directly_physically_interacts_with': 21, 'biolink:located_in': 10, 'biolink:disrupts': 9, 'biolink:causes': 5, 'biolink:has_participant': 5, 'biolink:has_part': 3, 'biolink:gene_associated_with_condition': 2, 'biolink:subclass_of': 1, 'biolink:occurs_in': 1, 'biolink:produces': 1})19.744 s
   2024-10-08T00:40:24.965542 DEBUG:    Number of edges in KG with attributes is 89019.744 s
   2024-10-08T00:40:24.972644 DEBUG:    Number of edges in KG by attribute Counter({None: 3646, 'defined_datetime': 890, 'metatype:Datetime': 890, 'EDAM-DATA:1772': 890, 'biolink:original_predicate': 776, 'biolink:knowledge_level': 776, 'biolink:agent_type': 776, 'biolink:publications': 377, 'bts:sentence': 51})19.751 s
   2024-10-08T00:40:24.972710 INFO:    Processing action 'resultify' with parameters {'': 'true'}19.751 s
   2024-10-08T00:40:24.972750 DEBUG:    Applying Resultifier to Message with parameters {'': 'true'}19.751 s
   2024-10-08T00:40:24.972780 INFO:    Clearing previous results and computing a new set of results19.751 s
   2024-10-08T00:40:24.974005 DEBUG:    Expanded qedges are {'N1', 'creative_DTD_qedge_2', 't_edge', 'creative_DTD_qedge_1', 'creative_DTD_qedge_0'}, expanded qnodes are {'creative_DTD_qnode_0', 'sn', 'creative_DTD_qnode_1', 'on'}; will resultify only this sub-QG19.753 s
   2024-10-08T00:40:24.974063 DEBUG:    Non-kryptonite qedges are {'N1', 'creative_DTD_qedge_2', 't_edge', 'creative_DTD_qedge_1', 'creative_DTD_qedge_0'}, non-kryptonite qnodes are {'creative_DTD_qnode_0', 'sn', 'creative_DTD_qnode_1', 'on'}.19.753 s
   2024-10-08T00:40:24.976898 DEBUG:    Grabbing only required portion of QG19.755 s
   2024-10-08T00:40:24.976958 DEBUG:    Required qnodes are {'sn', 'on'}, required qedges are {'N1', 't_edge'}19.755 s
   2024-10-08T00:40:24.977016 DEBUG:    KG does not fulfill the (required portion of the) QG. Unfulfilled qnodes: {'on'}. Unfulfilled qedges: {'N1', 't_edge'}.

looks like it will take some debugging to figure out where they're going...

possibly related to #2395 and #2396?

kvnthomas98 commented 1 month ago

I'll have a look

kvnthomas98 commented 1 month ago

Hi @amykglen ,

After xDTD and the KPs are called, The node has two set of attributes with attribute_type_id as biolink:synonym. The first one seems right, but the second one seems wrong. I have confirmed that after xDTD returns results, the attributes seem correct, however after expand calls other KPs at this point, the attributes seem incorrect. As one of the synonyms is immunological adjuvant which is part of our blocklist, this node gets removed from the KG and as a consequence of which we lose all results.

(Pdb) n1 = message.knowledge_graph.nodes["MONDO:0016262"]
(Pdb) n1.attributes[3]
{'attribute_source': None,
 'attribute_type_id': 'biolink:synonym',
 'attributes': None,
 'description': 'Names of all nodes in this synonym set in RTX-KG2.',
 'original_attribute_name': None,
 'value': ['uterine leiomyosarcoma',
           'Uterine leiomyosarcoma',
           'leiomyosarcoma of the corpus uteri',
           'uterus leiomyosarcoma',
           'Leiomyosarcoma of the corpus uteri',
           'Uterine Corpus Leiomyosarcoma'],
 'value_type_id': 'metatype:String',
 'value_url': None}
(Pdb) n1.attributes[6]
{'attribute_source': None,
 'attribute_type_id': 'biolink:synonym',
 'attributes': None,
 'description': 'Names of all nodes in this synonym set in RTX-KG2.',
 'original_attribute_name': None,
 'value': ['Immunologic Adjuvants',
           'Adjuvants, Immunologic',
           'immunological adjuvant'],
 'value_type_id': 'metatype:String',
 'value_url': None}
amykglen commented 1 month ago

huh! well that's odd. thanks for the debugging. tomorrow I'll look into why this node is being decorated with the wrong synonyms..

amykglen commented 1 month ago

ohh, I realized what's going on - it's a bug in Plover that I'm fixing now. will deploy tonight and confirm when resolved.

amykglen commented 1 month ago

alright, fixed on CI! https://arax.ci.transltr.io/?r=309955

thanks @kvnthomas98!!