RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 20 forks source link

FET throwing error for BTE query in node-normalization branch #893

Closed amykglen closed 4 years ago

amykglen commented 4 years ago

when I run this DSL in the node-normalization branch:

add_qnode(curie=DOID:11830, type=disease, id=n00)
add_qnode(type=gene, curie=[UniProtKB:P39060, UniProtKB:O43829, UniProtKB:P20849], is_set=true, id=n01)
add_qedge(source_id=n00, target_id=n01, id=e00)
expand(edge_id=e00, kp=BTE)
overlay(action=fisher_exact_test, source_qnode_id=n00, target_qnode_id=n01, virtual_relation_label=FET1)
filter_kg(action=remove_edges_by_attribute, edge_attribute=fisher_exact_test_p-value, direction=above, threshold=0.01, remove_connected_nodes=t, qnode_id=n01)
add_qnode(type=chemical_substance, id=n02)
add_qedge(source_id=n01, target_id=n02, id=e01)
expand(edge_id=e01, kp=BTE)
overlay(action=fisher_exact_test, source_qnode_id=n01, target_qnode_id=n02, virtual_relation_label=FET2)
filter_kg(action=remove_edges_by_attribute, edge_attribute=fisher_exact_test_p-value, direction=above, threshold=0.01, remove_connected_nodes=t, qnode_id=n02)
resultify()

FET is throwing an error:

2020-07-07 12:44:48.985454 INFO:  ARAXQuery launching on incoming Message
2020-07-07 12:44:48.985483 INFO:  Examine input query for needed information for dispatch
2020-07-07 12:44:48.985491 INFO:  Found input processing plan. Sending to the ProcessingPlanExecutor
2020-07-07 12:44:48.985500 DEBUG:  Entering executeProcessingPlan
2020-07-07 12:44:49.008932 DEBUG:  No starting messages were referenced. Will start with a blank template Message
2020-07-07 12:44:49.017621 DEBUG:  Found processing_actions
2020-07-07 12:44:49.017656 INFO:  Parsing input actions list
2020-07-07 12:44:49.017672 DEBUG:  Parsing action: add_qnode(curie=DOID:11830, type=disease, id=n00)
2020-07-07 12:44:49.017831 DEBUG:  Parsing action: add_qnode(type=gene, curie=[UniProtKB:P39060, UniProtKB:O43829, UniProtKB:P20849], is_set=true, id=n01)
2020-07-07 12:44:49.017924 DEBUG:  Parsing action: add_qedge(source_id=n00, target_id=n01, id=e00)
2020-07-07 12:44:49.017975 DEBUG:  Parsing action: expand(edge_id=e00, kp=BTE)
2020-07-07 12:44:49.018000 DEBUG:  Parsing action: overlay(action=fisher_exact_test, source_qnode_id=n00, target_qnode_id=n01, virtual_relation_label=FET1)
2020-07-07 12:44:49.018033 DEBUG:  Parsing action: filter_kg(action=remove_edges_by_attribute, edge_attribute=fisher_exact_test_p-value, direction=above, threshold=0.01, remove_connected_nodes=t, qnode_id=n01)
2020-07-07 12:44:49.018074 DEBUG:  Parsing action: add_qnode(type=chemical_substance, id=n02)
2020-07-07 12:44:49.018102 DEBUG:  Parsing action: add_qedge(source_id=n01, target_id=n02, id=e01)
2020-07-07 12:44:49.018130 DEBUG:  Parsing action: expand(edge_id=e01, kp=BTE)
2020-07-07 12:44:49.018152 DEBUG:  Parsing action: overlay(action=fisher_exact_test, source_qnode_id=n01, target_qnode_id=n02, virtual_relation_label=FET2)
2020-07-07 12:44:49.018186 DEBUG:  Parsing action: filter_kg(action=remove_edges_by_attribute, edge_attribute=fisher_exact_test_p-value, direction=above, threshold=0.01, remove_connected_nodes=t, qnode_id=n02)
2020-07-07 12:44:49.018226 DEBUG:  Parsing action: resultify()
2020-07-07 12:44:49.018246 DEBUG:  Parsing action:
2020-07-07 12:44:49.018323 INFO:  Processing action 'add_qnode' with parameters {'curie': 'DOID:11830', 'type': 'disease', 'id': 'n00'}
2020-07-07 12:44:49.018352 INFO:  Adding a QueryNode to Message with parameters {'id': 'n00', 'curie': 'DOID:11830', 'name': None, 'type': 'disease', 'is_set': None}
2020-07-07 12:44:49.024757 DEBUG:  Looking up CURIE DOID:11830 in KgNodeIndex
2020-07-07 12:44:49.038802 INFO:  Processing action 'add_qnode' with parameters {'type': 'gene', 'curie': ['UniProtKB:P39060', 'UniProtKB:O43829', 'UniProtKB:P20849'], 'is_set': 'true', 'id': 'n01'}
2020-07-07 12:44:49.038853 INFO:  Adding a QueryNode to Message with parameters {'id': 'n01', 'curie': ['UniProtKB:P39060', 'UniProtKB:O43829', 'UniProtKB:P20849'], 'name': None, 'type': 'gene', 'is_set': 'true'}
2020-07-07 12:44:49.042342 DEBUG:  Looking up CURIE UniProtKB:P39060 in KgNodeIndex
2020-07-07 12:44:49.045713 DEBUG:  Looking up CURIE UniProtKB:O43829 in KgNodeIndex
2020-07-07 12:44:49.046577 DEBUG:  Looking up CURIE UniProtKB:P20849 in KgNodeIndex
2020-07-07 12:44:49.047750 INFO:  Processing action 'add_qedge' with parameters {'source_id': 'n00', 'target_id': 'n01', 'id': 'e00'}
2020-07-07 12:44:49.047790 INFO:  Adding a QueryEdge to Message with parameters {'id': 'e00', 'source_id': 'n00', 'target_id': 'n01', 'type': None}
2020-07-07 12:44:49.047824 INFO:  Processing action 'expand' with parameters {'edge_id': 'e00', 'kp': 'BTE'}
2020-07-07 12:44:49.047849 DEBUG:  Applying Expand to Message with parameters {'edge_id': 'e00', 'node_id': None, 'kp': 'BTE', 'enforce_directionality': False, 'use_synonyms': True, 'synonym_handling': 'map_back', 'continue_if_no_results': False}
2020-07-07 12:44:49.048018 DEBUG:  Query graph for this Expand() call is: {'nodes': [{'id': 'n00', 'curie': 'DOID:11830', 'type': 'disease', 'is_set': None}, {'id': 'n01', 'curie': ['UniProtKB:P39060', 'UniProtKB:O43829', 'UniProtKB:P20849'], 'type': 'gene', 'is_set': True}], 'edges': [{'id': 'e00', 'type': None, 'relation': None, 'source_id': 'n00', 'target_id': 'n01', 'negated': None}]}
2020-07-07 12:44:49.048043 INFO:  Expanding edge e00 using BTE
2020-07-07 12:44:49.048102 DEBUG:  Looking for query nodes to use curie synonyms for
2020-07-07 12:44:49.048110 DEBUG:  Getting curie synonyms for qnode n00 using the NodeSynonymizer
2020-07-07 12:44:49.065654 DEBUG:  Getting curie synonyms for qnode n01 using the NodeSynonymizer
2020-07-07 12:44:49.088067 WARNING:  BTE cannot do bidirectional queries; the query for this edge will be directed, going: n00-->n01
2020-07-07 12:44:49.231905 DEBUG:  Sending query to BTE: UMLS:C0027092-->Gene
2020-07-07 12:44:51.530273 DEBUG:  Got results back from BTE for this query (502 edges)
2020-07-07 12:44:51.742732 DEBUG:  Sending query to BTE: DOID:11830-->Gene
2020-07-07 12:44:52.032899 DEBUG:  Sending query to BTE: HP:0000545-->Gene
2020-07-07 12:44:52.327512 DEBUG:  Sending query to BTE: MESH:D009216-->Gene
2020-07-07 12:44:54.398195 DEBUG:  Got results back from BTE for this query (487 edges)
2020-07-07 12:44:54.568399 DEBUG:  Sending query to BTE: MONDO:0001384-->Gene
2020-07-07 12:44:55.585964 DEBUG:  Sending query to BTE: OMIM:MTHU036427-->Gene
2020-07-07 12:44:55.594672 INFO:  Query for edge e00 returned results (e00: 10, n00: 2, n01: 3)
2020-07-07 12:44:55.594707 DEBUG:  Deduplicating nodes
2020-07-07 12:44:55.594730 DEBUG:  Getting preferred curies for n00 nodes returned in this step
2020-07-07 12:44:55.596354 DEBUG:  Sending NodeSynonymizer a list of 2 curies
2020-07-07 12:44:55.628338 DEBUG:  Got results back from NodeSynonymizer
2020-07-07 12:44:55.628688 DEBUG:  Getting preferred curies for n01 nodes returned in this step
2020-07-07 12:44:55.628959 DEBUG:  Sending NodeSynonymizer a list of 3 curies
2020-07-07 12:44:55.663544 DEBUG:  Got results back from NodeSynonymizer
2020-07-07 12:44:55.664196 DEBUG:  After deduplication, answer KG counts are: e00: 10, n00: 1, n01: 3
2020-07-07 12:44:55.664254 DEBUG:  Removing any self-edges from the answer KG
2020-07-07 12:44:55.664362 DEBUG:  Merging answer into Message.KnowledgeGraph
2020-07-07 12:44:55.664408 DEBUG:  Pruning any paths that are now dead ends
2020-07-07 12:44:55.664621 INFO:  After Expand, Message.KnowledgeGraph has 4 nodes and 10 edges
2020-07-07 12:44:55.664664 INFO:  Processing action 'overlay' with parameters {'action': 'fisher_exact_test', 'source_qnode_id': 'n00', 'target_qnode_id': 'n01', 'virtual_relation_label': 'FET1'}
2020-07-07 12:44:55.664813 INFO:  Performing Fisher's Exact Test to add p-value to edge attribute of virtual edge
2020-07-07 12:44:55.664973 DEBUG:  1 source node with qnode id n00 and node type disease was found in message KG and used to calculate Fisher's Exact Test
2020-07-07 12:44:55.664986 DEBUG:  3 target nodes with qnode id n01 and node type gene was found in message KG and used to calculate Fisher's Exact Test
2020-07-07 12:44:55.664996 DEBUG:  BTE was used to calculate total adjacent nodes in Fisher's Exact Test
2020-07-07 12:44:55.675453 ERROR:  Fail to query adjacent nodes from BTE for ['UniProtKB:P39060', 'UniProtKB:O43829', 'UniProtKB:P20849']
2020-07-07 12:44:55.675523 DEBUG:  Applying Overlay to Message with parameters {'action': 'fisher_exact_test', 'source_qnode_id': 'n00', 'target_qnode_id': 'n01', 'virtual_relation_label': 'FET1'}
2020-07-07 12:44:55.676053 DEBUG:  Query graph is {'edges': [{'id': 'e00', 'negated': None, 'relation': None, 'source_id': 'n00', 'target_id': 'n01', 'type': None}], 'nodes': [{'curie': 'DOID:11830', 'id': 'n00', 'is_set': None, 'type': 'disease'}, {'curie': ['UniProtKB:P39060', 'UniProtKB:O43829', 'UniProtKB:P20849'], 'id': 'n01', 'is_set': True, 'type': 'gene'}]}
2020-07-07 12:44:55.676076 DEBUG:  Number of nodes in KG is 4
2020-07-07 12:44:55.676123 DEBUG:  Number of nodes in KG by type is Counter({'gene': 3, 'disease': 1})
2020-07-07 12:44:55.676132 DEBUG:  Number of edges in KG is 10
2020-07-07 12:44:55.676161 DEBUG:  Number of edges in KG by type is Counter({'related_to': 10})
2020-07-07 12:44:55.676173 DEBUG:  Number of edges in KG with attributes is 0
2020-07-07 12:44:55.676189 DEBUG:  Number of edges in KG by attribute Counter()

note: to run code in the node-normalization branch, you need to build or download the node_synonymizer.sqlite file - some directions on how to do that are here.

dkoslicki commented 4 years ago

@chunyuma I'll leave it up to you to address this issue

chunyuma commented 4 years ago

The problem Fail to query adjacent nodes from BTE for ['UniProtKB:P39060', 'UniProtKB:O43829', 'UniProtKB:P20849'] was solved. But the FET will still throw an error for BET query which is Only KG1 or KG2 is allowable to calculate the Fisher's exact test temporally because we haven't had a best way to access the total number of nodes with certain types in BET right now.

amykglen commented 4 years ago

ah, right. that makes sense. I suppose this issue can be closed then!

(note that David already merged node-normalization to master last night, so you will probably want to merge your changes into master.)

chunyuma commented 4 years ago

I have merged it to master branch. So this issue can be closed.