RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Using an Aragorn message crashes reranking #1782

Open edeutsch opened 2 years ago

edeutsch commented 2 years ago

I tried to run an Aragorn message through reranking and it didn't go so well. I used 99adc6b5-0803-4c52-972e-fe587744a7aa

and the result was:

2022-02-02T22:47:49.169545 DEBUG: [] in query_return_message
2022-02-02T22:47:49.169663 INFO: [] ARAX Query launching on incoming Query
2022-02-02T22:47:49.169739 INFO: [] Creating an empty template TRAPI Response
2022-02-02T22:47:49.210358 INFO: [] Examine input Query for needed information for dispatch
2022-02-02T22:47:49.210436 INFO: [] Validating the input query graph
2022-02-02T22:47:49.210487 DEBUG: [] Deserializing message
2022-02-02T22:47:49.233541 INFO: [] Converting workflow elements to ARAXi
2022-02-02T22:47:49.233795 INFO: [] Found input processing plan. Sending to the ProcessingPlanExecutor
2022-02-02T22:47:49.233832 DEBUG: [] Entering execute_processing_plan
2022-02-02T22:47:49.270345 DEBUG: [] A single Message is ready and in hand
2022-02-02T22:47:49.270417 DEBUG: [] Found actions
2022-02-02T22:47:49.270468 INFO: [] Parsing input actions list
2022-02-02T22:47:49.270495 DEBUG: [] Parsing action: overlay(action=compute_ngd,default_value=inf)
2022-02-02T22:47:49.271539 DEBUG: [] Parsing action: overlay(action=overlay_clinical_info,COHD_method=paired_concept_frequency)
2022-02-02T22:47:49.271605 DEBUG: [] Parsing action: overlay(action=predict_drug_treats_disease)
2022-02-02T22:47:49.271651 DEBUG: [] Parsing action: overlay(action=fisher_exact_test,virtual_relation_label=connect_knodes_fisher,subject_qnode_key=n00,object_qnode_key=n01)
2022-02-02T22:47:49.271710 DEBUG: [] Parsing action: overlay(action=fisher_exact_test,virtual_relation_label=connect_knodes_fisher,subject_qnode_key=n01,object_qnode_key=n00)
2022-02-02T22:47:49.271769 DEBUG: [] Parsing action: scoreless_resultify(ignore_edge_direction=true)
2022-02-02T22:47:49.271812 DEBUG: [] Parsing action: rank_results()
2022-02-02T22:47:49.905984 INFO: [] Recomputing QG keys (annotating nodes/edges in the KGs with their QG keys)
2022-02-02T22:47:49.906387 INFO: [] Processing action 'overlay' with parameters {'action': 'compute_ngd', 'default_value': 'inf'}
2022-02-02T22:47:49.906433 DEBUG: [] Applying Overlay to Message with parameters {'action': 'compute_ngd', 'default_value': 'inf'}
2022-02-02T22:47:49.911944 DEBUG: [] Computing NGD
2022-02-02T22:47:49.911981 INFO: [] Computing the normalized Google distance: weighting edges based on subject/object node co-occurrence frequency in PubMed abstracts
2022-02-02T22:47:49.912016 DEBUG: [] Canonicalizing curies of relevant nodes using NodeSynonymizer
2022-02-02T22:47:49.914048 DEBUG: [] Extracting PMID lists from sqlite database for relevant nodes
2022-02-02T22:47:50.208422 DEBUG: [] Looping through edges and calculating NGD values
2022-02-02T22:47:50.296351 DEBUG: [] More than 30 publications found for some edges limiting to 30...
2022-02-02T22:47:52.657707 INFO: [] NGD values successfully added to edges
2022-02-02T22:47:52.658165 DEBUG: [] Decorating edges with EPC info from KG2c
2022-02-02T22:47:52.658263 DEBUG: [] Could not identify any NGD edges to decorate
2022-02-02T22:47:52.658950 DEBUG: [] Looking up EPC edge info in KG2c sqlite
2022-02-02T22:47:52.659256 DEBUG: [] Got 0 rows back from KG2c sqlite
2022-02-02T22:47:52.659291 DEBUG: [] Adding attributes to edges in the KG
2022-02-02T22:47:52.682096 DEBUG: [] Query graph is {'edges': {'e00': {'constraints': [],
                   'exclude': None,
                   'object': 'n01',
                   'option_group_id': None,
                   'predicates': ['biolink:subclass_of'],
                   'subject': 'n00'}},
 'nodes': {'n00': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': ['MONDO:0005737'],
                   'is_set': False,
                   'name': None,
                   'option_group_id': None},
           'n01': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': None,
                   'is_set': False,
                   'name': None,
                   'option_group_id': None}}}
2022-02-02T22:47:52.682178 DEBUG: [] Number of nodes in KG is 21
2022-02-02T22:47:52.682289 DEBUG: [] Number of nodes in KG by type is Counter({'biolink:Disease': 21})
2022-02-02T22:47:52.682318 DEBUG: [] Number of edges in KG is 89
2022-02-02T22:47:52.682426 DEBUG: [] Number of edges in KG by type is Counter({'biolink:subclass_of': 82, 'biolink:occurs_together_in_literature_with': 7})
2022-02-02T22:47:52.682482 DEBUG: [] Number of edges in KG with attributes is 89
2022-02-02T22:47:52.682954 DEBUG: [] Number of edges in KG by attribute Counter({None: 183, 'weight': 89, 'normalized_google_distance': 89, 'ngd_publications': 89, 'biolink:original_knowledge_source': 57, 'relation': 57, 'biolink:aggregator_knowledge_source': 57, 'num_publications': 7, 'source': 1})
2022-02-02T22:47:52.682992 INFO: [] Processing action 'overlay' with parameters {'action': 'overlay_clinical_info', 'COHD_method': 'paired_concept_frequency'}
2022-02-02T22:47:52.683040 DEBUG: [] Applying Overlay to Message with parameters {'action': 'overlay_clinical_info', 'COHD_method': 'paired_concept_frequency'}
2022-02-02T22:47:53.305339 INFO: [] Converting CURIE identifiers to human readable names
2022-02-02T22:47:53.305418 DEBUG: [] Computing paired concept frequencies.
2022-02-02T22:47:53.305442 INFO: [] Overlaying paired concept frequencies utilizing Columbia Open Health Data. This calls an external knowledge provider and may take a while
2022-02-02T22:47:53.650230 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and inflammatory disease
2022-02-02T22:47:53.650385 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and disease or disorder
2022-02-02T22:47:53.650488 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and disease or disorder
2022-02-02T22:47:53.650584 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.650678 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and arbovirus infection
2022-02-02T22:47:53.650771 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Hyperthermia
2022-02-02T22:47:53.650864 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral disease or post-viral disorder
2022-02-02T22:47:53.650956 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Disorder characterized by fever
2022-02-02T22:47:53.651048 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and arbovirus infection
2022-02-02T22:47:53.651138 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.651229 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Test Result
2022-02-02T22:47:53.651319 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Inflammation
2022-02-02T22:47:53.651409 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Inflammatory Response
2022-02-02T22:47:53.651501 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.651590 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.651680 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.651771 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder
2022-02-02T22:47:53.651861 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Ebola hemorrhagic fever
2022-02-02T22:47:53.651951 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.652040 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.652131 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder by Special Category
2022-02-02T22:47:53.652222 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral disease or post-viral disorder
2022-02-02T22:47:53.652313 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.652403 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.652494 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Mononegavirales infectious disease
2022-02-02T22:47:53.652584 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral disease or post-viral disorder
2022-02-02T22:47:53.652687 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder by Special Category
2022-02-02T22:47:53.652780 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Ebola hemorrhagic fever
2022-02-02T22:47:53.652870 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.652960 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and zoonosis
2022-02-02T22:47:53.653050 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.653140 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Mononegavirales infectious disease
2022-02-02T22:47:53.653230 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and disease or disorder
2022-02-02T22:47:53.653319 DEBUG: [] Querying Columbia Open Health data for info about viral infectious disease and Ebola hemorrhagic fever
2022-02-02T22:47:53.653409 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease or post-infectious disorder
2022-02-02T22:47:53.653499 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder
2022-02-02T22:47:53.653589 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Ebola hemorrhagic fever
2022-02-02T22:47:53.653708 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.653799 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral disease or post-viral disorder
2022-02-02T22:47:53.653889 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.653978 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.654069 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Mononegavirales infectious disease
2022-02-02T22:47:53.654159 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and inflammatory disease
2022-02-02T22:47:53.654249 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder by Special Category
2022-02-02T22:47:53.654339 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.654428 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and hemorrhagic fever
2022-02-02T22:47:53.654517 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.654607 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.654696 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder by Special Category
2022-02-02T22:47:53.654786 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.654876 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.654964 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral disease or post-viral disorder
2022-02-02T22:47:53.655054 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.655143 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and inflammatory disease
2022-02-02T22:47:53.655241 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.655332 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Mononegavirales infectious disease
2022-02-02T22:47:53.655421 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.655511 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.655602 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.655691 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and primary viral infectious disease
2022-02-02T22:47:53.655781 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and disease or disorder
2022-02-02T22:47:53.655872 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Mononegavirales infectious disease
2022-02-02T22:47:53.655961 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and inflammatory disease
2022-02-02T22:47:53.656051 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.656141 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.656230 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.656320 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral disease or post-viral disorder
2022-02-02T22:47:53.656410 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.656499 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral infectious disease
2022-02-02T22:47:53.656588 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and zoonosis
2022-02-02T22:47:53.656678 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.656767 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.656856 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder
2022-02-02T22:47:53.656946 DEBUG: [] Querying Columbia Open Health data for info about infectious disease and Ebola hemorrhagic fever
2022-02-02T22:47:53.657035 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Ebola hemorrhagic fever
2022-02-02T22:47:53.657124 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and disease or disorder
2022-02-02T22:47:53.657214 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and infectious disease
2022-02-02T22:47:53.657303 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder
2022-02-02T22:47:53.657393 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Ebola hemorrhagic fever
2022-02-02T22:47:53.657482 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder by Special Category
2022-02-02T22:47:53.657572 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.657683 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Non-Neoplastic Disorder
2022-02-02T22:47:53.657805 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Ebola hemorrhagic fever
2022-02-02T22:47:53.657903 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Mononegavirales infectious disease
2022-02-02T22:47:53.657994 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and inflammatory disease
2022-02-02T22:47:53.658084 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and disease or disorder
2022-02-02T22:47:53.658173 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and Filoviridae infectious disease
2022-02-02T22:47:53.658262 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and inflammatory disease
2022-02-02T22:47:53.658353 DEBUG: [] Querying Columbia Open Health data for info about Ebola hemorrhagic fever and viral hemorrhagic fever
2022-02-02T22:47:53.659421 DEBUG: [] Query graph is {'edges': {'e00': {'constraints': [],
                   'exclude': None,
                   'object': 'n01',
                   'option_group_id': None,
                   'predicates': ['biolink:subclass_of'],
                   'subject': 'n00'}},
 'nodes': {'n00': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': ['MONDO:0005737'],
                   'is_set': False,
                   'name': None,
                   'option_group_id': None},
           'n01': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': None,
                   'is_set': False,
                   'name': None,
                   'option_group_id': None}}}
2022-02-02T22:47:53.659463 DEBUG: [] Number of nodes in KG is 21
2022-02-02T22:47:53.659530 DEBUG: [] Number of nodes in KG by type is Counter({'biolink:Disease': 21})
2022-02-02T22:47:53.659556 DEBUG: [] Number of edges in KG is 89
2022-02-02T22:47:53.659634 DEBUG: [] Number of edges in KG by type is Counter({'biolink:subclass_of': 82, 'biolink:occurs_together_in_literature_with': 7})
2022-02-02T22:47:53.659682 DEBUG: [] Number of edges in KG with attributes is 89
2022-02-02T22:47:53.660142 DEBUG: [] Number of edges in KG by attribute Counter({None: 183, 'weight': 89, 'normalized_google_distance': 89, 'ngd_publications': 89, 'paired_concept_frequency': 89, 'biolink:original_knowledge_source': 57, 'relation': 57, 'biolink:aggregator_knowledge_source': 57, 'num_publications': 7, 'source': 1})
2022-02-02T22:47:53.660175 INFO: [] Processing action 'overlay' with parameters {'action': 'predict_drug_treats_disease'}
2022-02-02T22:47:53.660208 DEBUG: [] Applying Overlay to Message with parameters {'action': 'predict_drug_treats_disease'}
2022-02-02T22:47:54.453405 DEBUG: [] The 'predict_drug_treats_disease' action uses DTD database
2022-02-02T22:47:54.454079 DEBUG: [] Computing drug disease treatment probability based on a machine learning model
2022-02-02T22:47:54.454117 INFO: [] Computing drug disease treatment probability based on a machine learning model: See [this publication](https://doi.org/10.1101/765305) for more details about how this is accomplished.
2022-02-02T22:47:54.454715 INFO: [] Drug disease treatment probability successfully added to edges
2022-02-02T22:47:54.455676 DEBUG: [] Query graph is {'edges': {'e00': {'constraints': [],
                   'exclude': None,
                   'object': 'n01',
                   'option_group_id': None,
                   'predicates': ['biolink:subclass_of'],
                   'subject': 'n00'}},
 'nodes': {'n00': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': ['MONDO:0005737'],
                   'is_set': False,
                   'name': None,
                   'option_group_id': None},
           'n01': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': None,
                   'is_set': False,
                   'name': None,
                   'option_group_id': None}}}
2022-02-02T22:47:54.455721 DEBUG: [] Number of nodes in KG is 21
2022-02-02T22:47:54.455788 DEBUG: [] Number of nodes in KG by type is Counter({'biolink:Disease': 21})
2022-02-02T22:47:54.455831 DEBUG: [] Number of edges in KG is 89
2022-02-02T22:47:54.455914 DEBUG: [] Number of edges in KG by type is Counter({'biolink:subclass_of': 82, 'biolink:occurs_together_in_literature_with': 7})
2022-02-02T22:47:54.455964 DEBUG: [] Number of edges in KG with attributes is 89
2022-02-02T22:47:54.456443 DEBUG: [] Number of edges in KG by attribute Counter({None: 183, 'weight': 89, 'normalized_google_distance': 89, 'ngd_publications': 89, 'paired_concept_frequency': 89, 'biolink:original_knowledge_source': 57, 'relation': 57, 'biolink:aggregator_knowledge_source': 57, 'num_publications': 7, 'source': 1})
2022-02-02T22:47:54.456480 INFO: [] Processing action 'overlay' with parameters {'action': 'fisher_exact_test', 'virtual_relation_label': 'connect_knodes_fisher', 'subject_qnode_key': 'n00', 'object_qnode_key': 'n01'}
2022-02-02T22:47:54.456514 DEBUG: [] Applying Overlay to Message with parameters {'action': 'fisher_exact_test', 'virtual_relation_label': 'connect_knodes_fisher', 'subject_qnode_key': 'n00', 'object_qnode_key': 'n01'}
2022-02-02T22:47:54.465078 INFO: [] Performing Fisher's Exact Test to add p-value to edge attribute of virtual edge
2022-02-02T22:47:54.467640 DEBUG: [] Counter({'aragorn': 82, 'biothings-explorer': 59, 'automat-ctd': 11, 'automat-pharos': 11, 'rtx-kg2': 11, 'automat-uberongraph': 11, 'automat-cord19': 11, 'automat-ontology-hierarchy': 11, 'semmeddb': 10, 'mondo': 2, 'mesh': 2, 'efo': 2, 'ncit': 2, 'mydisease-info': 2, 'automat-hetio': 2, 'spoke': 1, 'disease-ontology': 1})
2022-02-02T22:47:54.467679 WARNING: [] More than one knowledge provider were detected to be used for expanding the edges connected to both subject node with qnode key n00 and object node with qnode key n01
2022-02-02T22:47:54.467702 WARNING: [] The knowledge provider aragorn was used to calculate Fisher's exact test because it has the maximum number of edges connected to both subject node with qnode key n00 and object node with qnode key n01
2022-02-02T22:47:54.467752 WARNING: [] Most of edges between the subject node with qnode key n00 and object node with qnode key n01 are from RTX-KG2 rather than RTX-KG2. But we can't access the total number of nodes with specific node type from RTX-KG2, so RTX-KG2 was still used to calcualte Fisher's exact test.
2022-02-02T22:47:54.467775 DEBUG: [] 1 subject node with qnode key n00 and node type biolink:Disease was found in message KG and used to calculate Fisher's Exact Test
2022-02-02T22:47:54.467796 DEBUG: [] 21 object nodes with qnode key n01 and node type biolink:Disease was found in message KG and used to calculate Fisher's Exact Test
2022-02-02T22:47:54.467815 DEBUG: [] RTX-KG2 was used to calculate total object nodes in Fisher's Exact Test
2022-02-02T22:47:54.472044 WARNING: [] One object node which is MONDO:0000001 can't find its neighbors. This node will be ignored for FET calculation.
2022-02-02T22:47:54.472353 DEBUG: [] Total 92040 unique concepts with node category biolink:Disease was found in KG2c based on 'nodesynonymizer.get_total_entity_count' and this number will be used for Fisher's Exact Test
2022-02-02T22:47:54.472388 DEBUG: [] Computing Fisher's Exact Test P-value
2022-02-02T22:47:54.491553 DEBUG: [] Adding virtual edge with FET result to message KG
2022-02-02T22:47:54.492031 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.492215 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.492394 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.492553 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.492710 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.492893 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.493078 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.493235 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.493389 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.493731 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.493896 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.494061 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.494217 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.494371 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.494525 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.494678 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.494832 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.494985 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.495139 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.495292 ERROR: [UncaughtError] Error encountered when modifying results with overlay edge (subject_knode_key)-kedge_key-(object_knode_key):
Traceback (most recent call last):
  File "/mnt/data/orangeboard/beta/RTX/code/ARAX/ARAXQuery/Overlay/overlay_utilities.py", line 108, in update_results_with_overlay_edge
    subject_nodes = [x.id for x in result.node_bindings[message.query_graph.edges[qedge_key].subject]]
KeyError: 's2'

2022-02-02T22:47:54.495333 DEBUG: [] 20 new virtual edges were added to message KG
2022-02-02T22:47:54.495355 DEBUG: [] Adding virtual edge to message QG
2022-02-02T22:47:54.495399 DEBUG: [] One virtual edge was added to message QG
2022-02-02T22:47:54.496344 DEBUG: [] Query graph is {'edges': {'connect_knodes_fisher': {'constraints': [],
                                     'exclude': None,
                                     'object': 'n01',
                                     'option_group_id': None,
                                     'predicates': ['biolink:has_fisher_exact_test_p_value_with'],
                                     'subject': 'n00'},
           'e00': {'constraints': [],
                   'exclude': None,
                   'object': 'n01',
                   'option_group_id': None,
                   'predicates': ['biolink:subclass_of'],
                   'subject': 'n00'}},
 'nodes': {'n00': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': ['MONDO:0005737'],
                   'is_set': False,
                   'name': None,
                   'option_group_id': None},
           'n01': {'categories': ['biolink:Disease'],
                   'constraints': [],
                   'ids': None,
                   'is_set': False,
                   'name': None,
                   'option_group_id': None}}}
2022-02-02T22:47:54.496383 DEBUG: [] Number of nodes in KG is 21
2022-02-02T22:47:54.496455 DEBUG: [] Number of nodes in KG by type is Counter({'biolink:Disease': 21})
2022-02-02T22:47:54.496481 DEBUG: [] Number of edges in KG is 109
2022-02-02T22:47:54.496587 DEBUG: [] Number of edges in KG by type is Counter({'biolink:subclass_of': 82, 'biolink:has_fisher_exact_test_p_value_with': 20, 'biolink:occurs_together_in_literature_with': 7})
2022-02-02T22:47:54.496647 DEBUG: [] Number of edges in KG with attributes is 109
2022-02-02T22:47:54.497173 DEBUG: [] Number of edges in KG by attribute Counter({None: 203, 'weight': 89, 'normalized_google_distance': 89, 'ngd_publications': 89, 'paired_concept_frequency': 89, 'biolink:original_knowledge_source': 57, 'relation': 57, 'biolink:aggregator_knowledge_source': 57, 'fisher_exact_test_p-value': 20, 'virtual_relation_label': 20, 'defined_datetime': 20, 'provided_by': 20, 'num_publications': 7, 'source': 1})
INFO:werkzeug:127.0.0.1 - - [02/Feb/2022 22:47:54] "GET /beta/api/arax/v1.2/meta_knowledge_graph?format=simple HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [02/Feb/2022 22:47:54] "POST /beta/api/arax/v1.2/query HTTP/1.1" 400 -

Ideas?

edeutsch commented 2 years ago

Thanks to Finn and Amy, this now seems resolved in master:

Retrieved a message with 34 results:
  - 2.679   ?
  - 2.608   ?
  - 2.605   ?
  - 2.129   ?
  - 1.657   ?
  - 1.524   ?
  - 1.340   ?
  - 1.340   ?
  - 1.340   ?
  - 1.340   ?
  - 1.340   ?
  - 1.268   ?
  - 1.268   ?
  - 1.265   ?
  - 1.265   ?
  - 1.265   ?
  - 1.020   ?
  - 0.790   ?
  - 0.790   ?
  - 0.536   ?
  - 0.332   ?
  - 0.317   ?
  - 0.317   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?
  - 0.268   ?

After score removal:
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?
  -None ?

Create a new request with the previous message and a workflow to rerank (overlay_connect_knodes,complete_results,score)
Results (19):
  - 1.000   Ebola hemorrhagic fever
  - 0.947   viral hemorrhagic fever
  - 0.895   Filoviridae infectious disease
  - 0.807   hemorrhagic fever
  - 0.684   infectious disease or post-infectious disorder
  - 0.649   viral infectious disease
  - 0.632   viral disease or post-viral disorder
  - 0.632   zoonosis
  - 0.614   arbovirus infection
  - 0.579   Non-Neoplastic Disorder by Special Category
  - 0.526   Mononegavirales infectious disease
  - 0.474   Non-Neoplastic Disorder
  - 0.421   primary viral infectious disease
  - 0.316   Disorder characterized by fever
  - 0.298   Test Result
  - 0.211   Inflammatory Response
  - 0.158   Hyperthermia
  - 0.088   infectious disease
  - 0.070   Inflammation
Data: https://arax.ncats.io/api/arax/v1.2/response/36278
GUI: https://arax.ncats.io/beta/?r=36278

Aragorn result: https://arax.ncats.io/beta/?r=99adc6b5-0803-4c52-972e-fe587744a7aa ARAX reranked result: https://arax.ncats.io/beta/?r=36278

finnagin commented 2 years ago

We will want to revisit this after the relay to see if we want to undo the hacks we needed to implement to get this to work.

edeutsch commented 2 years ago

Solved for now (in /beta, not production) BUT, we should probably revisit this when there is a Translator wide decision on whether edge_bindings to non-existent qedges is legal.

isbluis commented 2 years ago

It is interesting to note that their first/top result is the exact disease that was queried against, and so all edges are self edges. I can see this kind of thing potentially throwing off both automated systems and human reviewers. It feels like it should not be a thing...?

edeutsch commented 2 years ago

I would have thought so too, but I recall hearing Chris Mungall, grand master of ontologies, state that all entities are always a subclass of themselves. Stated as an obvious fact that everyone knows. I hope I am not misremembering.

edeutsch commented 2 years ago

But I should ask. I'll ask on the testing channel.

isbluis commented 2 years ago

all entities are always a subclass of themselves

This is how infinities are born... ;-)

edeutsch commented 2 years ago

So one possible complication that some might object to that is brought to light by this example:

The (vague) ask was to "rerank results".

We are actually not so much reranking the results as completely discarding the results and recomputing the results and ranking those.

The original set of results was 34. After our "reranking", there are only 19. this is because there appear to be what might be considered duplicate results in the original 34. If we were truly "reranking the results", we would output 34 results.

I don't know if this will be viewed favorably or unfavorably, but there it is. It will cause discussion.

The question will come up and I don't know the answer: do we have the ability to not change (discard and recompute) the results but rather just rerank them? (so that we would emit 34 results in this case)

For reference, the workflow we used for this is:

    "workflow": [
        {
        "id": "overlay_connect_knodes"
        },
        {
        "id": "complete_results"
        },
        {
        "id": "score"
        },
  ]

(as discussed)

edeutsch commented 2 years ago

As a slightly separate issue, I tried running with just:

    "workflow": [
        {
        "id": "score"
        },
  ]

and got:

2022-02-04T06:09:00.152997 ERROR: An uncaught error occurred: 'd347fc18-3bfd-4f5c-8fc0-d6ce45a2d2f6': ['Traceback (most recent call last):\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_query.py", line 753, in execute_processing_plan\n ranker.aggregate_scores_dmk(response)\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 621, in aggregate_scores_dmk\n _score_networkx_graphs_by_frobenius_norm])))\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 618, in <lambda>\n scorer_func),\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 160, in _score_result_graphs_by_networkx_graph_scorer\n results)\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 67, in _get_weighted_graphs_networkx_from_result_graphs\n result))\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 49, in _get_weighted_graph_networkx_from_result_graph\n kg_edge = kg_edge_id_to_edge[edge_binding.id]\n', "KeyError: 'd347fc18-3bfd-4f5c-8fc0-d6ce45a2d2f6'\n"]

If someone has time, it would nice for this at least to not to produce an unsightly stack trace, maybe just some nice English error messages.

And maybe someone has ideas on how to make this better. In principle it would be possible to see and keep and use their weights rather than just throwing them out and starting over.

finnagin commented 2 years ago

@edeutsch I just tried running this with jst score and i worked for me: https://arax.ncats.io/beta/?r=36489

Though since we don't see any scores we recognize in the results we rank everything as 1.

If we instead run the workflow:

    "workflow": [
        {
        "id": "overlay_connect_knodes"
        },
        {
        "id": "score"
        },
  ]

We can get ranks without nuking the results: https://arax.ncats.io/beta/?r=36488

finnagin commented 2 years ago

Oh the above was run with the BTE json. I just realized that you ran the example that Chris posted. I got the same error when running that.

edeutsch commented 2 years ago

ah interesting idea running complete_results. But as you see, I'm getting stack traces with my test queries.

finnagin commented 2 years ago

@edeutsch So what is weird is that when I run the code locally I don't get the error and it seems to work fine and it re-ranks correctly.

finnagin commented 2 years ago

Oops looks like the error was with my editing of the rerank test script. I changed endpoint_url but didn't notice it was used again to submit the query later. It's fixed now and looks like it works at least with the clinical_DCP example: https://arax.ncats.io/beta/?r=36497