Closed amykglen closed 4 years ago
@chunyuma - I'm running into errors with a couple FET tests when testing integration with KG2.3.4 in the kg2-arax-integration
branch:
FAILED test_ARAX_workflows.py::test_FET_example_2 - AssertionError: assert 'ERROR' == 'OK'
FAILED test_ARAX_workflows.py::test_FET_example_4 - AssertionError: assert 'ERROR' == 'OK'
here's the error for test_FET_example_2
:
- 2020-09-15 16:38:25.932687 INFO: After Expand, Message.KnowledgeGraph has 1044 nodes and 1758 edges (FET1: 1, FET2: 5, FET3: 132, e00: 2, e01: 5, e02: 133, e03: 1485, n00: 1, n01: 1, n02: 5, n03: 126, n04: 912)
- 2020-09-15 16:38:25.932818 INFO: Processing action 'overlay' with parameters {'action': 'fisher_exact_test', 'source_qnode_id': 'n03', 'target_qnode_id': 'n04', 'virtual_relation_label': 'FET4'}
- 2020-09-15 16:38:25.932837 DEBUG: Applying Overlay to Message with parameters {'action': 'fisher_exact_test', 'source_qnode_id': 'n03', 'target_qnode_id': 'n04', 'virtual_relation_label': 'FET4'}
- 2020-09-15 16:38:25.933574 INFO: Performing Fisher's Exact Test to add p-value to edge attribute of virtual edge
- 2020-09-15 16:38:25.999913 ERROR: Traceback (most recent call last):
File "/Users/amyglen/Projects/RTX/code/ARAX/test/../ARAXQuery/Overlay/fisher_exact_test.py", line 161, in fisher_exact_test
nodes_info[edge.source_id]['edge_index'].append(count)
KeyError: 'UniProtKB:Q96J02'
- 2020-09-15 16:38:25.999929 ERROR: Something went wrong with retrieving edges in message KG
and the one for test_FET_example_4
:
- 2020-09-15 16:57:23.685977 DEBUG: ARAX/KG2C was used to calculate total adjacent nodes in Fisher's Exact Test
- 2020-09-15 17:04:38.697293 WARNING: Although ARAX/KG2 was found to have the maximum number of edges connected to both n01 and n02, ARAX/KG1 and cypher query were used to find the total number of nodes with the same type of source node with qnode id n01 as KG2 might have many duplicates
- 2020-09-15 17:04:38.697321 DEBUG: Total 12089 nodes with node type phenotypic_feature was found in ARAX/KG1
- 2020-09-15 17:04:38.697325 DEBUG: Computing Fisher's Exact Test P-value
- 2020-09-15 17:04:39.489769 ERROR: Traceback (most recent call last):
File "/Users/amyglen/Projects/RTX/code/ARAX/test/../ARAXQuery/Overlay/fisher_exact_test.py", line 774, in _calculate_FET_pvalue_parallel
pvalue = stats.fisher_exact(contingency_table)[1]
File "/Users/amyglen/.pyenv/versions/3.7.8/envs/arax/lib/python3.7/site-packages/scipy/stats/stats.py", line 3630, in fisher_exact
raise ValueError("All values in `table` must be nonnegative.")
ValueError: All values in `table` must be nonnegative.
- 2020-09-15 17:04:39.489942 ERROR: Something went wrong for target node MONDO:0000001 to calculate FET p-value
- 2020-09-15 17:04:39.489951 ERROR: Traceback (most recent call last):
File "/Users/amyglen/Projects/RTX/code/ARAX/test/../ARAXQuery/Overlay/fisher_exact_test.py", line 774, in _calculate_FET_pvalue_parallel
pvalue = stats.fisher_exact(contingency_table)[1]
File "/Users/amyglen/.pyenv/versions/3.7.8/envs/arax/lib/python3.7/site-packages/scipy/stats/stats.py", line 3630, in fisher_exact
raise ValueError("All values in `table` must be nonnegative.")
ValueError: All values in `table` must be nonnegative.
- 2020-09-15 17:04:39.489957 ERROR: Something went wrong for target node UMLS:C1519221 to calculate FET p-value
- 2020-09-15 17:04:39.489962 ERROR: Traceback (most recent call last):
File "/Users/amyglen/Projects/RTX/code/ARAX/test/../ARAXQuery/Overlay/fisher_exact_test.py", line 774, in _calculate_FET_pvalue_parallel
pvalue = stats.fisher_exact(contingency_table)[1]
File "/Users/amyglen/.pyenv/versions/3.7.8/envs/arax/lib/python3.7/site-packages/scipy/stats/stats.py", line 3630, in fisher_exact
raise ValueError("All values in `table` must be nonnegative.")
ValueError: All values in `table` must be nonnegative.
- 2020-09-15 17:04:39.489967 ERROR: Something went wrong for target node MONDO:0004992 to calculate FET p-value
- 2020-09-15 17:04:39.489973 ERROR: Traceback (most recent call last):
File "/Users/amyglen/Projects/RTX/code/ARAX/test/../ARAXQuery/Overlay/fisher_exact_test.py", line 774, in _calculate_FET_pvalue_parallel
pvalue = stats.fisher_exact(contingency_table)[1]
File "/Users/amyglen/.pyenv/versions/3.7.8/envs/arax/lib/python3.7/site-packages/scipy/stats/stats.py", line 3630, in fisher_exact
raise ValueError("All values in `table` must be nonnegative.")
ValueError: All values in `table` must be nonnegative.
- 2020-09-15 17:04:39.489978 ERROR: Something went wrong for target node MONDO:0005070 to calculate FET p-value
do you have any idea what these might be about? to reproduce them, you would first need to:
kg2-arax-integration
branchscp ubuntu@arax.rtx.ai:/data/orangeboard/databases/KG2.3.4/config.json RTX/code/
scp ubuntu@arax.rtx.ai:/data/orangeboard/databases/KG2.3.4/node_synonymizer.sqlite RTX/code/ARAX/NodeSynonymizer/
OK, those two errors are fixed!
COHD database was rebuilt based on kg2.3.4 and was named as COHDdatabase_v2.0.db
. I put it under /data/orangeboard/databases/KG2.3.4
on arax.rtx.ai
server.
Here is a summary for the new COHD database.
Preferred Type | Number of Nodes | Number of Nodes with OMOP ids | percent (%) |
---|---|---|---|
chemical_substance | 2198938 | 63215 | 2.87 |
protein | 7002 | 940 | 13.42 |
organism_taxon | 1145 | 650 | 56.77 |
anatomical_entity | 141 | 119 | 84.4 |
phenotypic_feature | 67343 | 32738 | 48.61 |
named_thing | 2593 | 2343 | 90.36 |
disease | 127271 | 107207 | 84.24 |
drug | 110973 | 98867 | 89.09 |
molecular_entity | 10406 | 8719 | 83.79 |
metabolite | 7 | 4 | 57.14 |
biological_entity | 55 | 6 | 10.91 |
ontology_class | 33 | 21 | 63.64 |
genomic_entity | 38346 | 430 | 1.12 |
individual_organism | 385 | 360 | 93.51 |
gross_anatomical_structure | 3697 | 3245 | 87.77 |
cellular_component | 14 | 13 | 92.86 |
procedure | 335 | 326 | 97.31 |
information_content_entity | 1201 | 1011 | 84.18 |
attribute | 47 | 19 | 40.43 |
publication | 141 | 94 | 66.67 |
device | 655 | 652 | 99.54 |
disease_or_phenotypic_feature | 53171 | 7459 | 14.03 |
activity_and_behavior | 902 | 879 | 97.45 |
occurrent | 61546 | 61088 | 99.26 |
phenomenon | 14780 | 12264 | 82.98 |
physiological_process | 210 | 165 | 78.57 |
molecular_activity | 2 | 2 | 100.0 |
gene | 120 | 3 | 2.5 |
biological_process | 32 | 24 | 75.0 |
pathway | 2 | 1 | 50.0 |
abstract_entity | 5 | 5 | 100.0 |
gene_grouping | 1 | 1 | 100.0 |
clinical_intervention | 1 | 1 | 100.0 |
quantity_value | 3 | 3 | 100.0 |
material_sample | 2 | 1 | 50.0 |
provider | 1 | 1 | 100.0 |
gene_family | 1 | 1 | 100.0 |
relationship_type | 3 | 0 | 0.0 |
gene_product | 1 | 0 | 0.0 |
cell | 1 | 0 | 0.0 |
Note: Some nodes with either chemical_substance
, phenotypic_feature
, drug
, disease
in KG2.3.4 will map to other preferred types in KG2.3.4C.
Hi @amykglen, all databases were updated for KG2.3.4 and put in /data/orangeboard/databases/KG2.3.4
on the server. Please help test everything together in your dev environment. Thank you!
awesome! yes, I'll work on testing everything together. thanks!
alright, so I tested ARAX+KG2.3.4 in a local setup, using the KG2.3.4-specific config.json
, node_synonymizer.sqlite
, NGD database, COHD database, and DTD database (stored in /data/orangeboard/databases/KG2.3.4/
on the server), and all is running smoothly! the entire pytest suite passes, including slow tests. (this testing was done in the kg2-arax-integration
branch of course, which contains the code changes necessitated by this new KG2 version.)
so I think we are ready to make KG2.3.4 our "production" KG2 at any point now, by:
kg2-arax-integration
branch into master
# Database cache area
Outside container is /data/orangeboard/databases/KG2.3.4
Inside container is /mnt/data/orangeboard/databases/KG2.3.4
INST=devED SRC=/mnt/data/orangeboard/databases/KG2.3.4 DST=/mnt/data/orangeboard/$INST/RTX
cd $DST/code git pull
cd $DST/code cp -p $SRC/config.json .
cd $DST/code/ARAX/NodeSynonymizer cp -p $SRC/node_synonymizer.sqlite .
cd $DST/code/ARAX/ARAXQuery/Overlay/ngd cp -p $SRC/curie_to_pmids.sqlite .
cd $DST/code/ARAX/KnowledgeSources/COHD_local/data cp -p $SRC/COHDdatabase_v*.db .
cd $DST/code/ARAX/ARAXQuery/Overlay/predictor/retrain_data cp -p $SRC/GRAPH.sqlite . cp -p $SRC/LogModel.pkl .
cd $DST/code/ARAX/test pytest -v --durations=10
INST=devED service RTXOpenAPI$INST restart sleep 1 tail -f /tmp/RTXOpenAPI$INST.elog
documentation about what things will need updating is at: https://github.com/RTXteam/RTX/wiki/Deployment-info#things-that-need-updating-when-rolling-out-a-new-kg2-version
rebuild/edit these and put them in
/data/orangeboard/databases/KG2.3.4
on the server:config.json
node_synonymizer.sqlite
)curie_to_pmids.sqlite
)COHDdatabase_vX.X.db
) @chunyumaGRAPH.sqlite
) @chunyumaother to do's:
expand
) that might be needed based on schema changes or the like in KG2.3.4 (put these changes in thekg2-arax-integration
branch)kg2-arax-integration
branchfinally: