RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 20 forks source link

Incorrect node type in expand #659

Closed dkoslicki closed 4 years ago

dkoslicki commented 4 years ago

In ARAX_query.py 12:

query = { "previous_message_processing_plan": { "processing_actions": [
            "create_message",
            "add_qnode(name=DOID:14330, id=n00)",
            "add_qnode(type=protein, is_set=true, id=n01)",
            "add_qnode(type=chemical_substance, id=n02)",
            "add_qedge(source_id=n00, target_id=n01, id=e00)",
            "add_qedge(source_id=n01, target_id=n02, id=e01, type=physically_interacts_with)",
            "expand(edge_id=[e00,e01], kp=ARAX/KG1)",
            "overlay(action=compute_jaccard, start_node_id=n00, intermediate_node_id=n01, end_node_id=n02, virtual_edge_type=J1)",
            "filter_kg(action=remove_edges_by_attribute, edge_attribute=jaccard_index, direction=below, threshold=.2, remove_connected_nodes=t, qnode_id=n02)",
            "filter_kg(action=remove_edges_by_property, edge_property=provided_by, property_value=Pharos)",  # can be removed, but shows we can filter by Knowledge provider
            "resultify(ignore_edge_direction=true)",
            "filter_results(action=sort_by_edge_attribute, edge_attribute=jaccard_index, direction=descending, max_results=10)",
            "return(message=true, store=false)",
            ] } }

an empty KG is being returned by expand(). It appears that Parkinson's was assigned the wrong CURIE and type:

2020-03-05 13:15:31.930149 INFO: Sending sub query graph to KG1Querier:
 {'nodes': 
[{'id': 'n00', 'curie': 241689, 'type': 'DOID:14330', 'is_set': None},  #<--------HERE
 {'id': 'n01', 'curie': None, 'type': 'protein', 'is_set': True},
 {'id': 'n02', 'curie': None, 'type': 'chemical_substance', 'is_set': None}], 
'edges': 
[{'id': 'e00', 'type': None, 'relation': None, 'source_id': 'n00', 'target_id': 'n01', 'negated': None},
 {'id': 'e01', 'type': 'physically_interacts_with', 'relation': None, 'source_id': 'n01', 'target_id': 'n02', 'negated': None}]}

This happened on the demo branch.

amykglen commented 4 years ago

huh, looking into it

dkoslicki commented 4 years ago

Note: bug still exists at f15f090401c275e98c8ac9bf3ed7ce9023eeb59b

dkoslicki commented 4 years ago

Bug does not exist at 68b56cc06640c06d5eeb7f0f959a23b4000a04de

amykglen commented 4 years ago

Thanks. This bug is not present in the expander branch, strangely... Looking into what's going on in demo.

edeutsch commented 4 years ago

I probably botched the merge. Maybe a git merging expert should reverse what I did last night and try again.

dkoslicki commented 4 years ago

Thanks. This bug is not present in the expander branch, strangely... Looking into what's going on in demo.

Really?

$ git checkout expander
$ git pull origin expander
$ python ARAX_query.py 12
<snip>
  - 2020-03-05 13:39:13.025534 INFO: Extracting sub query graph to expand
  - 2020-03-05 13:39:13.026496 INFO: Sending sub query graph to KG1Querier: 
{'nodes': [{'id': 'n00', 'curie': 241689, 'type': 'DOID:14330', 'is_set': None}, #<----HERE
{'id': 'n01', 'curie': None, 'type': 'protein', 'is_set': True}, 
{'id': 'n02', 'curie': None, 'type': 'chemical_substance', 'is_set': True}], 
'edges': [{'id': 'e00', 'type': None, 'relation': None, 'source_id': 'n00', 'target_id': 'n01', 'negated': None}, 
{'id': 'e01', 'type': 'physically_interacts_with', 'relation': None, 'source_id': 'n01', 'target_id': 'n02', 'negated': None}]}
dkoslicki commented 4 years ago

@amykglen Are you using the newest KGNodeIndex per these instructions? With the updated dumpdata.py code from the overlay branch that now dumps both KG1 and KG2 node CURIE's, names, and types?

amykglen commented 4 years ago

I think so - I updated my KGNodeIndex the other day following your directions on #641... and it works with both KG2 and KG1 curies.

amykglen commented 4 years ago

And yes, when I do just what you did:

$ git checkout expander
$ git pull origin expander
$ python ARAX_query.py 12

everything seems to run fine:

null
Response:
  status: OK
  n_errors: 0  n_warnings: 0  n_messages: 80
  - 2020-03-05 12:50:43.021854 INFO: ARAXQuery launching
  - 2020-03-05 12:50:43.021878 INFO: Examine input query for needed information for dispatch
  - 2020-03-05 12:50:43.021886 INFO: Found input processing plan. Sending to the ProcessingPlanExecutor
  - 2020-03-05 12:50:43.021893 DEBUG: Entering executeProcessingPlan
  - 2020-03-05 12:50:43.175892 DEBUG: No starting messages were referenced. Will start with a blank template Message
  - 2020-03-05 12:50:43.176766 DEBUG: Found processing_actions
  - 2020-03-05 12:50:43.176792 INFO: Parsing input actions list
  - 2020-03-05 12:50:43.176803 DEBUG: Parsing action: create_message
  - 2020-03-05 12:50:43.177146 DEBUG: Parsing action: add_qnode(name=DOID:14330, id=n00)
  - 2020-03-05 12:50:43.177700 DEBUG: Parsing action: add_qnode(type=protein, is_set=true, id=n01)
  - 2020-03-05 12:50:43.177739 DEBUG: Parsing action: add_qnode(type=chemical_substance, is_set=true, id=n02)
  - 2020-03-05 12:50:43.177770 DEBUG: Parsing action: add_qedge(source_id=n00, target_id=n01, id=e00)
  - 2020-03-05 12:50:43.177798 DEBUG: Parsing action: add_qedge(source_id=n01, target_id=n02, id=e01, type=physically_interacts_with)
  - 2020-03-05 12:50:43.177830 DEBUG: Parsing action: expand(edge_id=[e00,e01])
  - 2020-03-05 12:50:43.178398 DEBUG: Parsing action: overlay(action=compute_jaccard, start_node_id=n00, intermediate_node_id=n01, end_node_id=n02, virtual_edge_type=J1)
  - 2020-03-05 12:50:43.178454 DEBUG: Parsing action: filter_kg(action=remove_edges_by_attribute, edge_attribute=jaccard_index, direction=below, threshold=.2, remove_connected_nodes=t, qnode_id=n02)
  - 2020-03-05 12:50:43.178545 DEBUG: Parsing action: filter_kg(action=remove_edges_by_property, edge_property=provided_by, property_value=Pharos)
  - 2020-03-05 12:50:43.178601 DEBUG: Parsing action: resultify(ignore_edge_direction=true, force_isset_false=[n02])
  - 2020-03-05 12:50:43.178633 DEBUG: Parsing action: return(message=true, store=false)
  - 2020-03-05 12:50:43.183072 DEBUG: Considering action 'create_message' with parameters None
  - 2020-03-05 12:50:43.183103 INFO: Creating an empty template ARAX Message
  - 2020-03-05 12:50:43.183529 DEBUG: Considering action 'add_qnode' with parameters {'name': 'DOID:14330', 'id': 'n00'}
  - 2020-03-05 12:50:43.183554 INFO: Adding a QueryNode to Message with parameters {'id': 'n00', 'curie': None, 'name': 'DOID:14330', 'type': None, 'is_set': None}
  - 2020-03-05 12:50:43.183898 DEBUG: Looking up CURIE DOID:14330 in KgNodeIndex
  - 2020-03-05 12:50:43.195889 DEBUG: Considering action 'add_qnode' with parameters {'type': 'protein', 'is_set': 'true', 'id': 'n01'}
  - 2020-03-05 12:50:43.195941 INFO: Adding a QueryNode to Message with parameters {'id': 'n01', 'curie': None, 'name': None, 'type': 'protein', 'is_set': 'true'}
  - 2020-03-05 12:50:43.196295 DEBUG: Considering action 'add_qnode' with parameters {'type': 'chemical_substance', 'is_set': 'true', 'id': 'n02'}
  - 2020-03-05 12:50:43.196321 INFO: Adding a QueryNode to Message with parameters {'id': 'n02', 'curie': None, 'name': None, 'type': 'chemical_substance', 'is_set': 'true'}
  - 2020-03-05 12:50:43.196550 DEBUG: Considering action 'add_qedge' with parameters {'source_id': 'n00', 'target_id': 'n01', 'id': 'e00'}
  - 2020-03-05 12:50:43.196575 INFO: Adding a QueryEdge to Message with parameters {'id': 'e00', 'source_id': 'n00', 'target_id': 'n01', 'type': None}
  - 2020-03-05 12:50:43.196616 DEBUG: Considering action 'add_qedge' with parameters {'source_id': 'n01', 'target_id': 'n02', 'id': 'e01', 'type': 'physically_interacts_with'}
  - 2020-03-05 12:50:43.196632 INFO: Adding a QueryEdge to Message with parameters {'id': 'e01', 'source_id': 'n01', 'target_id': 'n02', 'type': 'physically_interacts_with'}
  - 2020-03-05 12:50:43.196657 DEBUG: Considering action 'expand' with parameters {'edge_id': ['e00', 'e01']}
  - 2020-03-05 12:50:43.196674 DEBUG: Applying Expand to Message with parameters {'edge_id': ['e00', 'e01'], 'kp': None}
  - 2020-03-05 12:50:43.196689 INFO: Extracting sub query graph to expand
  - 2020-03-05 12:50:43.213472 INFO: Sending sub query graph to KG1Querier: {'nodes': [{'id': 'n00', 'curie': 'DOID:14330', 'type': 'disease', 'is_set': None}, {'id': 'n01', 'curie': None, 'type': 'protein', 'is_set': True}, {'id': 'n02', 'curie': None, 'type': 'chemical_substance', 'is_set': True}], 'edges': [{'id': 'e00', 'type': None, 'relation': None, 'source_id': 'n00', 'target_id': 'n01', 'negated': None}, {'id': 'e01', 'type': 'physically_interacts_with', 'relation': None, 'source_id': 'n01', 'target_id': 'n02', 'negated': None}]}
  - 2020-03-05 12:50:44.787363 INFO: QueryGraphReasoner returned 1871 results (1138 nodes, 1889 edges)
  - 2020-03-05 12:50:44.798857 INFO: Merging answer knowledge graph into Message.KnowledgeGraph
  - 2020-03-05 12:50:44.800954 INFO: After Expand, Message.KnowledgeGraph has 1138 nodes and 1889 edges
  - 2020-03-05 12:50:44.801170 DEBUG: Considering action 'overlay' with parameters {'action': 'compute_jaccard', 'start_node_id': 'n00', 'intermediate_node_id': 'n01', 'end_node_id': 'n02', 'virtual_edge_type': 'J1'}
  - 2020-03-05 12:50:44.819562 DEBUG: Computing Jaccard distance and adding this information as virtual edges
  - 2020-03-05 12:50:44.819605 INFO: Computing Jaccard distance and adding this information as virtual edges
  - 2020-03-05 12:50:44.819618 INFO: Getting all relevant nodes
  - 2020-03-05 12:50:44.842926 DEBUG: Applying Overlay to Message with parameters {'action': 'compute_jaccard', 'start_node_id': 'n00', 'intermediate_node_id': 'n01', 'end_node_id': 'n02', 'virtual_edge_type': 'J1'}
  - 2020-03-05 12:50:44.844679 DEBUG: Query graph is {'edges': [{'id': 'e00',
            'negated': None,
            'relation': None,
            'source_id': 'n00',
            'target_id': 'n01',
            'type': None},
           {'id': 'e01',
            'negated': None,
            'relation': None,
            'source_id': 'n01',
            'target_id': 'n02',
            'type': 'physically_interacts_with'},
           {'id': 'J1',
            'negated': None,
            'relation': 'jaccard_index',
            'source_id': 'n00',
            'target_id': 'n02',
            'type': 'J1'}],
 'nodes': [{'curie': 'DOID:14330',
            'id': 'n00',
            'is_set': None,
            'type': 'disease'},
           {'curie': None, 'id': 'n01', 'is_set': True, 'type': 'protein'},
           {'curie': None,
            'id': 'n02',
            'is_set': True,
            'type': 'chemical_substance'}]}
  - 2020-03-05 12:50:44.844717 DEBUG: Number of nodes in KG is 1138
  - 2020-03-05 12:50:44.845701 DEBUG: Number of nodes in KG by type is Counter({'chemical_substance': 1119, 'protein': 18, 'disease': 1})
  - 2020-03-05 12:50:44.845714 DEBUG: Number of edges in KG is 3008
  - 2020-03-05 12:50:44.847324 DEBUG: Number of edges in KG by type is Counter({'physically_interacts_with': 1871, 'J1': 1119, 'gene_associated_with_condition': 18})
  - 2020-03-05 12:50:44.848155 DEBUG: Number of edges in KG with attributes is 1119
  - 2020-03-05 12:50:44.849620 DEBUG: Number of edges in KG by attribute Counter({'jaccard_index': 1119})
  - 2020-03-05 12:50:44.849648 DEBUG: Considering action 'filter_kg' with parameters {'action': 'remove_edges_by_attribute', 'edge_attribute': 'jaccard_index', 'direction': 'below', 'threshold': '.2', 'remove_connected_nodes': 't', 'qnode_id': 'n02'}
  - 2020-03-05 12:50:44.867155 DEBUG: Removing Edges
  - 2020-03-05 12:50:44.867195 INFO: Removing edges from the knowledge graph with the specified attribute values
  - 2020-03-05 12:50:44.874328 DEBUG: Removing Nodes
  - 2020-03-05 12:50:44.874349 INFO: Removing connected nodes and their edges from the knowledge graph
  - 2020-03-05 12:50:44.883423 INFO: Edges successfully removed
  - 2020-03-05 12:50:44.883626 DEBUG: Applying Overlay to Message with parameters {'action': 'remove_edges_by_attribute', 'edge_attribute': 'jaccard_index', 'direction': 'below', 'threshold': 0.2, 'remove_connected_nodes': True, 'qnode_id': 'n02'}
  - 2020-03-05 12:50:44.884489 DEBUG: Query graph is {'edges': [{'id': 'e00',
            'negated': None,
            'relation': None,
            'source_id': 'n00',
            'target_id': 'n01',
            'type': None},
           {'id': 'e01',
            'negated': None,
            'relation': None,
            'source_id': 'n01',
            'target_id': 'n02',
            'type': 'physically_interacts_with'},
           {'id': 'J1',
            'negated': None,
            'relation': 'jaccard_index',
            'source_id': 'n00',
            'target_id': 'n02',
            'type': 'J1'}],
 'nodes': [{'curie': 'DOID:14330',
            'id': 'n00',
            'is_set': None,
            'type': 'disease'},
           {'curie': None, 'id': 'n01', 'is_set': True, 'type': 'protein'},
           {'curie': None,
            'id': 'n02',
            'is_set': True,
            'type': 'chemical_substance'}]}
  - 2020-03-05 12:50:44.884512 DEBUG: Number of nodes in KG is 57
  - 2020-03-05 12:50:44.884597 DEBUG: Number of nodes in KG by type is Counter({'chemical_substance': 38, 'protein': 18, 'disease': 1})
  - 2020-03-05 12:50:44.884606 DEBUG: Number of edges in KG is 216
  - 2020-03-05 12:50:44.884770 DEBUG: Number of edges in KG by type is Counter({'physically_interacts_with': 160, 'J1': 38, 'gene_associated_with_condition': 18})
  - 2020-03-05 12:50:44.884848 DEBUG: Number of edges in KG with attributes is 38
  - 2020-03-05 12:50:44.884945 DEBUG: Number of edges in KG by attribute Counter({'jaccard_index': 38})
  - 2020-03-05 12:50:44.884964 DEBUG: Considering action 'filter_kg' with parameters {'action': 'remove_edges_by_property', 'edge_property': 'provided_by', 'property_value': 'Pharos'}
  - 2020-03-05 12:50:44.893991 DEBUG: Removing Edges
  - 2020-03-05 12:50:44.894009 INFO: Removing edges from the knowledge graph matching the specified property
  - 2020-03-05 12:50:44.897938 INFO: Edges successfully removed
  - 2020-03-05 12:50:44.897959 DEBUG: Applying Overlay to Message with parameters {'action': 'remove_edges_by_property', 'edge_property': 'provided_by', 'property_value': 'Pharos', 'remove_connected_nodes': False}
  - 2020-03-05 12:50:44.898695 DEBUG: Query graph is {'edges': [{'id': 'e00',
            'negated': None,
            'relation': None,
            'source_id': 'n00',
            'target_id': 'n01',
            'type': None},
           {'id': 'e01',
            'negated': None,
            'relation': None,
            'source_id': 'n01',
            'target_id': 'n02',
            'type': 'physically_interacts_with'},
           {'id': 'J1',
            'negated': None,
            'relation': 'jaccard_index',
            'source_id': 'n00',
            'target_id': 'n02',
            'type': 'J1'}],
 'nodes': [{'curie': 'DOID:14330',
            'id': 'n00',
            'is_set': None,
            'type': 'disease'},
           {'curie': None, 'id': 'n01', 'is_set': True, 'type': 'protein'},
           {'curie': None,
            'id': 'n02',
            'is_set': True,
            'type': 'chemical_substance'}]}
  - 2020-03-05 12:50:44.898711 DEBUG: Number of nodes in KG is 57
  - 2020-03-05 12:50:44.898781 DEBUG: Number of nodes in KG by type is Counter({'chemical_substance': 38, 'protein': 18, 'disease': 1})
  - 2020-03-05 12:50:44.898790 DEBUG: Number of edges in KG is 183
  - 2020-03-05 12:50:44.898885 DEBUG: Number of edges in KG by type is Counter({'physically_interacts_with': 127, 'J1': 38, 'gene_associated_with_condition': 18})
  - 2020-03-05 12:50:44.898935 DEBUG: Number of edges in KG with attributes is 38
  - 2020-03-05 12:50:44.899018 DEBUG: Number of edges in KG by attribute Counter({'jaccard_index': 38})
  - 2020-03-05 12:50:44.899036 DEBUG: Considering action 'resultify' with parameters {'ignore_edge_direction': 'true', 'force_isset_false': ['n02']}
  - 2020-03-05 12:50:44.908624 DEBUG: Applying Resultifier to Message with parameters {'ignore_edge_direction': 'true', 'force_isset_false': ['n02']}
  - 2020-03-05 12:50:44.908654 DEBUG: Considering action 'return' with parameters {'message': 'true', 'store': 'false'}

Number of results: 38
Result qg_id's in results: {'e00', 'e01', 'J1'}
For example 15 (demo eg. 3), number of TP proteins: 0
Number of KnowledgeProviders in KG: Counter({'ChEMBL': 127, 'ARAX/RTX': 38, 'DisGeNet': 13, 'BioLink': 5})
dkoslicki commented 4 years ago

Quite odd! Let me drop my index, rebuild, and see if this problem is actually just on my end.

amykglen commented 4 years ago

It also works fine for me on demo, when I checkout demo and pull..

dkoslicki commented 4 years ago

My apologies @amykglen and @edeutsch, it was an issue on my end with a stale KGNodeIndex. False alarm. Closing