RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Overlay error with 'predict_drug_treats_disease' in resultify test 'test_issue680' #832

Closed amykglen closed 4 years ago

amykglen commented 4 years ago

Noticed that test_issue680 in test_ARAX_resultify.py is failing in the master branch today, with an error coming from Overlay:

Supplied value n00 is not permitted. In action {'predict_drug_treats_disease'}, allowable values to target_qnode_id are: []

This test was working last week (at least as of Thursday June 11), and the test itself hasn't changed since then. Not sure what's up.

_______________________________________________ TestARAXResultify.test_issue680 _______________________________________________

self = <test_ARAX_resultify.TestARAXResultify testMethod=test_issue680>

    def test_issue680(self):
        query = {"previous_message_processing_plan": {"processing_actions": [
            "create_message",
            "add_qnode(curie=DOID:14330, id=n00)",
            "add_qnode(type=protein, is_set=true, id=n01)",
            "add_qnode(type=chemical_substance, id=n02)",
            "add_qedge(source_id=n00, target_id=n01, id=e00)",
            "add_qedge(source_id=n01, target_id=n02, id=e01, type=physically_interacts_with)",
            "expand(edge_id=[e00,e01], kp=ARAX/KG1)",
            "overlay(action=compute_jaccard, start_node_id=n00, intermediate_node_id=n01, end_node_id=n02, virtual_relation_label=J1)",
            "filter_kg(action=remove_edges_by_attribute, edge_attribute=jaccard_index, direction=below, threshold=.2, remove_connected_nodes=t, qnode_id=n02)",
            "filter_kg(action=remove_edges_by_property, edge_property=provided_by, property_value=Pharos)",
            "overlay(action=predict_drug_treats_disease, source_qnode_id=n02, target_qnode_id=n00, virtual_relation_label=P1)",
            "resultify(ignore_edge_direction=true, debug=true)",
            "filter_results(action=limit_number_of_results, max_results=1)",
            "return(message=true, store=false)",
        ]}}
        [response, message] = _do_arax_query(query)
>       assert response.status == 'OK'
E       AssertionError: assert 'ERROR' == 'OK'
E         - OK
E         + ERROR

test_ARAX_resultify.py:946: AssertionError
---------------------------------------------------- Captured stdout call -----------------------------------------------------
Response:
  status: ERROR
  n_errors: 1  n_warnings: 0  n_messages: 103
  error_code: UnknownValue   message: Supplied value n00 is not permitted. In action {'predict_drug_treats_disease'}, allowable values to target_qnode_id are: []
  - 2020-06-16 11:17:03.167281 INFO: ARAXQuery launching on incoming Message
  - 2020-06-16 11:17:03.167302 INFO: Examine input query for needed information for dispatch
  - 2020-06-16 11:17:03.167310 INFO: Found input processing plan. Sending to the ProcessingPlanExecutor
  - 2020-06-16 11:17:03.167316 DEBUG: Entering executeProcessingPlan
  - 2020-06-16 11:17:03.169535 DEBUG: No starting messages were referenced. Will start with a blank template Message
  - 2020-06-16 11:17:03.169957 DEBUG: Found processing_actions
  - 2020-06-16 11:17:03.169972 INFO: Parsing input actions list
  - 2020-06-16 11:17:03.169980 DEBUG: Parsing action: create_message
  - 2020-06-16 11:17:03.169998 DEBUG: Parsing action: add_qnode(curie=DOID:14330, id=n00)
  - 2020-06-16 11:17:03.170031 DEBUG: Parsing action: add_qnode(type=protein, is_set=true, id=n01)
  - 2020-06-16 11:17:03.170061 DEBUG: Parsing action: add_qnode(type=chemical_substance, id=n02)
  - 2020-06-16 11:17:03.170085 DEBUG: Parsing action: add_qedge(source_id=n00, target_id=n01, id=e00)
  - 2020-06-16 11:17:03.170112 DEBUG: Parsing action: add_qedge(source_id=n01, target_id=n02, id=e01, type=physically_interacts_with)
  - 2020-06-16 11:17:03.170143 DEBUG: Parsing action: expand(edge_id=[e00,e01], kp=ARAX/KG1)
  - 2020-06-16 11:17:03.170173 DEBUG: Parsing action: overlay(action=compute_jaccard, start_node_id=n00, intermediate_node_id=n01, end_node_id=n02, virtual_relation_label=J1)
  - 2020-06-16 11:17:03.170209 DEBUG: Parsing action: filter_kg(action=remove_edges_by_attribute, edge_attribute=jaccard_index, direction=below, threshold=.2, remove_connected_nodes=t, qnode_id=n02)
  - 2020-06-16 11:17:03.170249 DEBUG: Parsing action: filter_kg(action=remove_edges_by_property, edge_property=provided_by, property_value=Pharos)
  - 2020-06-16 11:17:03.170277 DEBUG: Parsing action: overlay(action=predict_drug_treats_disease, source_qnode_id=n02, target_qnode_id=n00, virtual_relation_label=P1)
  - 2020-06-16 11:17:03.170308 DEBUG: Parsing action: resultify(ignore_edge_direction=true, debug=true)
  - 2020-06-16 11:17:03.170331 DEBUG: Parsing action: filter_results(action=limit_number_of_results, max_results=1)
  - 2020-06-16 11:17:03.170355 DEBUG: Parsing action: return(message=true, store=false)
  - 2020-06-16 11:17:03.170419 INFO: Processing action 'create_message' with parameters None
  - 2020-06-16 11:17:03.170429 INFO: Creating an empty template ARAX Message
  - 2020-06-16 11:17:03.170733 INFO: Processing action 'add_qnode' with parameters {'curie': 'DOID:14330', 'id': 'n00'}
  - 2020-06-16 11:17:03.170753 INFO: Adding a QueryNode to Message with parameters {'id': 'n00', 'curie': 'DOID:14330', 'name': None, 'type': None, 'is_set': None}
  - 2020-06-16 11:17:03.170977 DEBUG: Looking up CURIE DOID:14330 in KgNodeIndex
  - 2020-06-16 11:17:03.171364 INFO: Processing action 'add_qnode' with parameters {'type': 'protein', 'is_set': 'true', 'id': 'n01'}
  - 2020-06-16 11:17:03.171388 INFO: Adding a QueryNode to Message with parameters {'id': 'n01', 'curie': None, 'name': None, 'type': 'protein', 'is_set': 'true'}
  - 2020-06-16 11:17:03.172226 INFO: Processing action 'add_qnode' with parameters {'type': 'chemical_substance', 'id': 'n02'}
  - 2020-06-16 11:17:03.172278 INFO: Adding a QueryNode to Message with parameters {'id': 'n02', 'curie': None, 'name': None, 'type': 'chemical_substance', 'is_set': None}
  - 2020-06-16 11:17:03.172833 INFO: Processing action 'add_qedge' with parameters {'source_id': 'n00', 'target_id': 'n01', 'id': 'e00'}
  - 2020-06-16 11:17:03.172871 INFO: Adding a QueryEdge to Message with parameters {'id': 'e00', 'source_id': 'n00', 'target_id': 'n01', 'type': None}
  - 2020-06-16 11:17:03.172906 INFO: Processing action 'add_qedge' with parameters {'source_id': 'n01', 'target_id': 'n02', 'id': 'e01', 'type': 'physically_interacts_with'}
  - 2020-06-16 11:17:03.172923 INFO: Adding a QueryEdge to Message with parameters {'id': 'e01', 'source_id': 'n01', 'target_id': 'n02', 'type': 'physically_interacts_with'}
  - 2020-06-16 11:17:03.172946 INFO: Processing action 'expand' with parameters {'edge_id': ['e00', 'e01'], 'kp': 'ARAX/KG1'}
  - 2020-06-16 11:17:03.172971 DEBUG: Applying Expand to Message with parameters {'edge_id': ['e00', 'e01'], 'node_id': None, 'kp': 'ARAX/KG1', 'enforce_directionality': False, 'use_synonyms': True, 'synonym_handling': 'map_back', 'continue_if_no_results': False}
  - 2020-06-16 11:17:03.173411 DEBUG: Query graph for this Expand() call is: {'nodes': [{'id': 'n00', 'curie': 'DOID:14330', 'type': None, 'is_set': None}, {'id': 'n01', 'curie': None, 'type': 'protein', 'is_set': True}, {'id': 'n02', 'curie': None, 'type': 'chemical_substance', 'is_set': None}], 'edges': [{'id': 'e00', 'type': None, 'relation': None, 'source_id': 'n00', 'target_id': 'n01', 'negated': None}, {'id': 'e01', 'type': 'physically_interacts_with', 'relation': None, 'source_id': 'n01', 'target_id': 'n02', 'negated': None}]}
  - 2020-06-16 11:17:03.173456 INFO: Expanding edge e00 using ARAX/KG1
  - 2020-06-16 11:17:03.173546 DEBUG: Looking for query nodes to use curie synonyms for
  - 2020-06-16 11:17:03.174222 DEBUG: Could not find any synonyms for curie DOID:14330
  - 2020-06-16 11:17:03.174250 DEBUG: Generating cypher for edge e00 query graph
  - 2020-06-16 11:17:03.174284 INFO: Sending cypher query for edge e00 to KG1 neo4j
  - 2020-06-16 11:17:04.048431 INFO: Query for edge e00 returned results (e00: 39, n00: 1, n01: 39)
  - 2020-06-16 11:17:04.048456 DEBUG: Processing query results for edge e00
  - 2020-06-16 11:17:04.050244 DEBUG: Merging answer into Message.KnowledgeGraph
  - 2020-06-16 11:17:04.050321 DEBUG: Pruning any paths that are now dead ends
  - 2020-06-16 11:17:04.050600 INFO: Expanding edge e01 using ARAX/KG1
  - 2020-06-16 11:17:04.050698 DEBUG: Looking for query nodes to use curie synonyms for
  - 2020-06-16 11:17:04.050712 DEBUG: Generating cypher for edge e01 query graph
  - 2020-06-16 11:17:04.050761 INFO: Sending cypher query for edge e01 to KG1 neo4j
  - 2020-06-16 11:17:06.204767 INFO: Query for edge e01 returned results (e01: 3742, n01: 18, n02: 1119)
  - 2020-06-16 11:17:06.204806 DEBUG: Processing query results for edge e01
  - 2020-06-16 11:17:06.342048 DEBUG: Merging answer into Message.KnowledgeGraph
  - 2020-06-16 11:17:06.348131 DEBUG: Pruning any paths that are now dead ends
  - 2020-06-16 11:17:06.385621 INFO: After Expand, Message.KnowledgeGraph has 1138 nodes and 3760 edges
  - 2020-06-16 11:17:06.386686 INFO: Processing action 'overlay' with parameters {'action': 'compute_jaccard', 'start_node_id': 'n00', 'intermediate_node_id': 'n01', 'end_node_id': 'n02', 'virtual_relation_label': 'J1'}
  - 2020-06-16 11:17:06.386752 DEBUG: Computing Jaccard distance and adding this information as virtual edges
  - 2020-06-16 11:17:06.386760 INFO: Computing Jaccard distance and adding this information as virtual edges
  - 2020-06-16 11:17:06.386765 INFO: Getting all relevant nodes
  - 2020-06-16 11:17:06.417598 DEBUG: Applying Overlay to Message with parameters {'action': 'compute_jaccard', 'start_node_id': 'n00', 'intermediate_node_id': 'n01', 'end_node_id': 'n02', 'virtual_relation_label': 'J1'}
  - 2020-06-16 11:17:06.418514 DEBUG: Query graph is {'edges': [{'id': 'e00',
            'negated': None,
            'relation': None,
            'source_id': 'n00',
            'target_id': 'n01',
            'type': None},
           {'id': 'e01',
            'negated': None,
            'relation': None,
            'source_id': 'n01',
            'target_id': 'n02',
            'type': 'physically_interacts_with'},
           {'id': 'J1',
            'negated': None,
            'relation': 'J1',
            'source_id': 'n00',
            'target_id': 'n02',
            'type': 'has_jaccard_index_with'}],
 'nodes': [{'curie': 'DOID:14330', 'id': 'n00', 'is_set': None, 'type': None},
           {'curie': None, 'id': 'n01', 'is_set': True, 'type': 'protein'},
           {'curie': None,
            'id': 'n02',
            'is_set': None,
            'type': 'chemical_substance'}]}
  - 2020-06-16 11:17:06.418549 DEBUG: Number of nodes in KG is 1138
  - 2020-06-16 11:17:06.419764 DEBUG: Number of nodes in KG by type is Counter({'chemical_substance': 1119, 'protein': 18, 'disease': 1})
  - 2020-06-16 11:17:06.419790 DEBUG: Number of edges in KG is 4879
  - 2020-06-16 11:17:06.423208 DEBUG: Number of edges in KG by type is Counter({'physically_interacts_with': 3742, 'has_jaccard_index_with': 1119, 'gene_associated_with_condition': 18})
  - 2020-06-16 11:17:06.425597 DEBUG: Number of edges in KG with attributes is 4139
  - 2020-06-16 11:17:06.431046 DEBUG: Number of edges in KG by attribute Counter({'probability': 3020, 'jaccard_index': 1119})
  - 2020-06-16 11:17:06.431103 INFO: Processing action 'filter_kg' with parameters {'action': 'remove_edges_by_attribute', 'edge_attribute': 'jaccard_index', 'direction': 'below', 'threshold': '.2', 'remove_connected_nodes': 't', 'qnode_id': 'n02'}
  - 2020-06-16 11:17:06.438391 DEBUG: Removing Edges
  - 2020-06-16 11:17:06.438418 INFO: Removing edges from the knowledge graph with the specified attribute values
  - 2020-06-16 11:17:06.448168 DEBUG: Removing Nodes
  - 2020-06-16 11:17:06.448200 INFO: Removing connected nodes and their edges from the knowledge graph
  - 2020-06-16 11:17:06.463159 INFO: Edges successfully removed
  - 2020-06-16 11:17:06.463314 DEBUG: Applying Overlay to Message with parameters {'action': 'remove_edges_by_attribute', 'edge_attribute': 'jaccard_index', 'direction': 'below', 'threshold': 0.2, 'remove_connected_nodes': True, 'qnode_id': 'n02'}
  - 2020-06-16 11:17:06.464222 DEBUG: Query graph is {'edges': [{'id': 'e00',
            'negated': None,
            'relation': None,
            'source_id': 'n00',
            'target_id': 'n01',
            'type': None},
           {'id': 'e01',
            'negated': None,
            'relation': None,
            'source_id': 'n01',
            'target_id': 'n02',
            'type': 'physically_interacts_with'},
           {'id': 'J1',
            'negated': None,
            'relation': 'J1',
            'source_id': 'n00',
            'target_id': 'n02',
            'type': 'has_jaccard_index_with'}],
 'nodes': [{'curie': 'DOID:14330', 'id': 'n00', 'is_set': None, 'type': None},
           {'curie': None, 'id': 'n01', 'is_set': True, 'type': 'protein'},
           {'curie': None,
            'id': 'n02',
            'is_set': None,
            'type': 'chemical_substance'}]}
  - 2020-06-16 11:17:06.464247 DEBUG: Number of nodes in KG is 57
  - 2020-06-16 11:17:06.464337 DEBUG: Number of nodes in KG by type is Counter({'chemical_substance': 38, 'protein': 18, 'disease': 1})
  - 2020-06-16 11:17:06.464346 DEBUG: Number of edges in KG is 376
  - 2020-06-16 11:17:06.464787 DEBUG: Number of edges in KG by type is Counter({'physically_interacts_with': 320, 'has_jaccard_index_with': 38, 'gene_associated_with_condition': 18})
  - 2020-06-16 11:17:06.465037 DEBUG: Number of edges in KG with attributes is 292
  - 2020-06-16 11:17:06.465433 DEBUG: Number of edges in KG by attribute Counter({'probability': 254, 'jaccard_index': 38})
  - 2020-06-16 11:17:06.465457 INFO: Processing action 'filter_kg' with parameters {'action': 'remove_edges_by_property', 'edge_property': 'provided_by', 'property_value': 'Pharos'}
  - 2020-06-16 11:17:06.489232 DEBUG: Removing Edges
  - 2020-06-16 11:17:06.489258 INFO: Removing edges from the knowledge graph matching the specified property
  - 2020-06-16 11:17:06.498842 INFO: Edges successfully removed
  - 2020-06-16 11:17:06.498892 DEBUG: Applying Overlay to Message with parameters {'action': 'remove_edges_by_property', 'edge_property': 'provided_by', 'property_value': 'Pharos', 'remove_connected_nodes': False}
  - 2020-06-16 11:17:06.499640 DEBUG: Query graph is {'edges': [{'id': 'e00',
            'negated': None,
            'relation': None,
            'source_id': 'n00',
            'target_id': 'n01',
            'type': None},
           {'id': 'e01',
            'negated': None,
            'relation': None,
            'source_id': 'n01',
            'target_id': 'n02',
            'type': 'physically_interacts_with'},
           {'id': 'J1',
            'negated': None,
            'relation': 'J1',
            'source_id': 'n00',
            'target_id': 'n02',
            'type': 'has_jaccard_index_with'}],
 'nodes': [{'curie': 'DOID:14330', 'id': 'n00', 'is_set': None, 'type': None},
           {'curie': None, 'id': 'n01', 'is_set': True, 'type': 'protein'},
           {'curie': None,
            'id': 'n02',
            'is_set': None,
            'type': 'chemical_substance'}]}
  - 2020-06-16 11:17:06.499659 DEBUG: Number of nodes in KG is 57
  - 2020-06-16 11:17:06.499752 DEBUG: Number of nodes in KG by type is Counter({'chemical_substance': 38, 'protein': 18, 'disease': 1})
  - 2020-06-16 11:17:06.499763 DEBUG: Number of edges in KG is 310
  - 2020-06-16 11:17:06.500200 DEBUG: Number of edges in KG by type is Counter({'physically_interacts_with': 254, 'has_jaccard_index_with': 38, 'gene_associated_with_condition': 18})
  - 2020-06-16 11:17:06.500433 DEBUG: Number of edges in KG with attributes is 292
  - 2020-06-16 11:17:06.501142 DEBUG: Number of edges in KG by attribute Counter({'probability': 254, 'jaccard_index': 38})
  - 2020-06-16 11:17:06.501249 INFO: Processing action 'overlay' with parameters {'action': 'predict_drug_treats_disease', 'source_qnode_id': 'n02', 'target_qnode_id': 'n00', 'virtual_relation_label': 'P1'}
  - 2020-06-16 11:17:06.501320 ERROR: Supplied value n00 is not permitted. In action {'predict_drug_treats_disease'}, allowable values to target_qnode_id are: []
  - 2020-06-16 11:17:06.501339 DEBUG: Applying Overlay to Message with parameters {'action': 'predict_drug_treats_disease', 'source_qnode_id': 'n02', 'target_qnode_id': 'n00', 'virtual_relation_label': 'P1'}
  - 2020-06-16 11:17:06.502620 DEBUG: Query graph is {'edges': [{'id': 'e00',
            'negated': None,
            'relation': None,
            'source_id': 'n00',
            'target_id': 'n01',
            'type': None},
           {'id': 'e01',
            'negated': None,
            'relation': None,
            'source_id': 'n01',
            'target_id': 'n02',
            'type': 'physically_interacts_with'},
           {'id': 'J1',
            'negated': None,
            'relation': 'J1',
            'source_id': 'n00',
            'target_id': 'n02',
            'type': 'has_jaccard_index_with'}],
 'nodes': [{'curie': 'DOID:14330', 'id': 'n00', 'is_set': None, 'type': None},
           {'curie': None, 'id': 'n01', 'is_set': True, 'type': 'protein'},
           {'curie': None,
            'id': 'n02',
            'is_set': None,
            'type': 'chemical_substance'}]}
  - 2020-06-16 11:17:06.502654 DEBUG: Number of nodes in KG is 57
  - 2020-06-16 11:17:06.502766 DEBUG: Number of nodes in KG by type is Counter({'chemical_substance': 38, 'protein': 18, 'disease': 1})
  - 2020-06-16 11:17:06.502779 DEBUG: Number of edges in KG is 310
  - 2020-06-16 11:17:06.503092 DEBUG: Number of edges in KG by type is Counter({'physically_interacts_with': 254, 'has_jaccard_index_with': 38, 'gene_associated_with_condition': 18})
  - 2020-06-16 11:17:06.503245 DEBUG: Number of edges in KG with attributes is 292
  - 2020-06-16 11:17:06.503594 DEBUG: Number of edges in KG by attribute Counter({'probability': 254, 'jaccard_index': 38})

I checked to make sure the knowledge graph actually contains nodes for each qnode_id, and all seems ok in that regard:

{
   'n00':1,
   'n01':18,
   'n02':38
}
amykglen commented 4 years ago

Interesting: noticed that if I specify a type for n00 in the DSL, by changing this line:

            "add_qnode(curie=DOID:14330, id=n00)",

to this:

            "add_qnode(curie=DOID:14330, id=n00, type=disease)",

the test passes.

so it seems Overlay is reliant on qnodes having types? and the recent change made to add_qnode so that it no longer automatically adds a type to qnodes with a curie explains why this test that worked previously is now failing.

chunyuma commented 4 years ago

@amykglen, I think the recent change made to add_qnode that causes it fail. This is the same to FET. So I let the script throw an error and remind the users to add a type if they forget it.

dkoslicki commented 4 years ago

Yup, that would be the problem. The drug_treats_disease specifically looks for disease nodes, so will error out if no disease nodes are in the QG.

dkoslicki commented 4 years ago

Ok, I've fixed this, all tests passing, and merged to master. @amykglen @chunyuma What I decided to do is:

  1. Instead of having ARAX_overlay.py check if the parameters provided have the right QG node types
  2. Have the code itself (i.e. predict_drug_treats_disease.py) at run time determine if the corresponding KG nodes have the proper type (and then decorate/add edges/properties if they do have the proper type, and do nothing if they don't have the proper type).

That way you don't need to specifically specify the node type in the QG, but after the expand, the code checks the node types at that point.