Open saramsey opened 2 years ago
OK, it appears that the remedy is to add a bit of logic to the module
create_csv_of_kp_predicate_triples.py
I am testing create_csv_of_kp_predicate_triples.py
, to see how long it takes to run.
I am now testing the script create_csv_of_kp_predicate_triples.py
on kg2-7-6c.rtx.ai
Hi @amykglen do you know what the typical running time of this script is?
no, but I think @acevedol does - she's been taking care of running that for new KG2 versions.
also note that in line 116 of that script, we need to change v1.2
--> v1.3
. I think we forgot to do this after we made TRAPI 1.3 production.
https://github.com/RTXteam/RTX/blob/b2aea971a4834b43fe9fd6315bea4f23aecceb46/code/ARAX/KnowledgeSources/create_csv_of_kp_predicate_triples.py#L116
OK will do.
also note that in line 116 of that script, we need to change
v1.2
-->v1.3
. I think we forgot to do this after we made TRAPI 1.3 production.
OK, I've coded this change on a local repo on kg2-7-6c.rtx.ai
. Testing it now.
Hi @amykglen should this line of code also be converted to v1.3
?
that's probably a good idea! I think the main()
function doesn't currently run the function that is in, but we may as well fix it.
So, Richard B. has also advised (via Slack) that there is a second problem, namely, tuples in KG2c_test_triples.json that have either a subject category or object category that is too abstract, like biolink:NamedThing
, or a mixin (which is now allowed). He wrote:
Hmm, the test triples file still contains nodes whose categories are biolink:PhysicalEssenceOrOccurrent
which is evidently now a mixin. Need to filter out....
OK, in the module create_csv_of_kp_predicate_triples.py
, in the function get_kg2c_predicate_triple_examples
, there is a rather complex Cypher query,
match (n)-[r]->(m)
with distinct labels(n) as n1s, type(r) as rel, labels(m) as n2s
unwind n1s as n1 unwind n2s as n2
with distinct n1 as node1, rel as relationship, n2 as node2
where node1 <> "Base" and node2 <> "Base"
CALL apoc.cypher.run('match (n2:`' + node1 + '`)-[r2:`' + relationship + '`]->(m2:`' + node2 + '`)
return n2.id as subject, m2.id as object limit 1', {})
YIELD value
return node1, relationship, node2, value.subject as subject, value.object as object
which I think can (and should) be revised to this simpler query:
match (n)-[r]->(m)
where not (n.id starts with 'biolink:') and not (m.id starts with 'biolink:')
with distinct n.category as node1, r.predicate as relationship, m.category as node2,
min(n.id + '---' + m.id) as mk
with *, split(mk, '---') as s
return node1, relationship, node2, head(s) as subject, head(tail(s)) as object
which also contains a WHERE clause to omit triples for which the subject or object ID starts with the biolink:
prefix (since such triples would be rejected by the SRI test harness).
Information about how the URL (for the test triples JSON document ) is provided to the SRI testing harness, can be found in this issue: https://github.com/RTXteam/RTX/issues/1876
basically it is via the ARAX (or RTX-KG2 KP) SmartAPI registration.
Now, when we run
pytest -s -vvv test_onehops.py --kp_id=rtx-kg2 --one \
--x_maturity="development" --ara_id=None --teststyle=by_subject
we see this error:
TypeError: TRAPIResponseValidator.check_compliance_of_trapi_response() got an unexpected keyword argument 'message'
../../translator/trapi/__init__.py:275: TypeError
--------------------------------------------------------------- Captured log setup ----------------------------------------------------------------
DEBUG asyncio:selector_events.py:54 Using selector: KqueueSelector
DEBUG asyncio:selector_events.py:54 Using selector: KqueueSelector
---------------------------------------------------------------- Captured log call ----------------------------------------------------------------
DEBUG urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): raw.githubusercontent.com:443
DEBUG urllib3.connectionpool:connectionpool.py:456 https://raw.githubusercontent.com:443 "GET /NCATSTranslator/ReasonerAPI/v1.3.0/TranslatorReasonerAPI.yaml HTTP/1.1" 200 10181
DEBUG urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): arax.ncats.io:443
DEBUG urllib3.connectionpool:connectionpool.py:456 https://arax.ncats.io:443 "POST /beta/api/rtxkg2/v1.3/query HTTP/1.1" 200 23838
-------------------------------------------------------------- Captured log teardown --------------------------------------------------------------
DEBUG asyncio:selector_events.py:54 Using selector: KqueueSelector
============================================================= short test summary info =============================================================
FAILED test_onehops.py::test_trapi_kps[rtx-kg2#0-by_subject] - TypeError: TRAPIResponseValidator.check_compliance_of_trapi_response() got an unexpected keyword argument 'message'
As noted in the call, there was a time that some versions of the TRAPIResponseValidator required a message object as input (despite the name of the program/class). Richard has since fixed this in the very latest versions of the validator.
Passing a Response object (containing a message) when it was expecting a message object might lead to this behavior.
OK, this was apparently due to a bug in SRI_testing/translator/trapi/__init__.py
. I've sent a proposed bugfix to Richard and Lili.
@edeutsch thank you, you diagnosed the issue exactly
For the latest SRI_testing harness to run, we need to
_test_report_database: TestReportDatabase = get_test_report_database(True)
validator.check_compliance_of_trapi_response(response=trapi_response['response_json'])
In order to only test the kg2c test triples file in kg2integration, we use pytest -s -vvv tests/onehop/test_onehops.py --kp_id=rtx-kg2 --x_maturity="development" --ara_id=None
Failure on raise_subject_entity
:
======================================== FAILURES ========================================
_____________________ test_trapi_kps[rtx-kg2#0-raise_subject_entity] _____________________
kp_trapi_case = {'biolink_version': '2.2.11', 'idx': 0, 'kp_id': 'infores:rtx-kg2', 'kp_source': 'infores:rtx-kg2', ...}
trapi_creator = <function raise_subject_entity at 0x111bfe0d0>
results_bag = ResultsBag:
{'location': 'https://raw.githubusercontent.com/RTXteam/RTX/kg2integration/code/ARAX/KnowledgeSources/RTX_...es:rtx-kg2', 'kp_source_type': 'primary'}, 'unit_test_report': <translator.trapi.UnitTestReport object at 0x122c11550>}
@pytest.mark.asyncio
async def test_trapi_kps(kp_trapi_case, trapi_creator, results_bag):
"""Generic Test for TRAPI KPs. The kp_trapi_case fixture is created in conftest.py by looking at KP test triples
These get successively fed into test_trapi_kps. This function is further parameterized by trapi_creator, which
knows how to take an input edge and create some kind of TRAPI query from it. For instance, by_subject removes
the object, while raise_object_by_subject removes the object and replaces the object category with its
biolink parent. This approach will need modification if there turn out to be particular elements we want
to test for different creators.
"""
results_bag.location = kp_trapi_case['ks_test_data_location']
results_bag.case = kp_trapi_case
results_bag.unit_test_report = UnitTestReport(
test_case=kp_trapi_case,
test_name=trapi_creator.__name__,
trapi_version=kp_trapi_case['trapi_version'],
biolink_version=kp_trapi_case['biolink_version'],
sources={
"kp_source": kp_trapi_case['kp_source'],
"kp_source_type": kp_trapi_case['kp_source_type']
}
)
if in_excluded_tests(test=trapi_creator, test_case=kp_trapi_case):
_report_and_skip_edge(
"KP",
test=trapi_creator,
test_case=kp_trapi_case,
test_report=results_bag.unit_test_report,
excluded_test=True
)
elif UnitTestReport.has_validation_errors("pre-validation", kp_trapi_case):
_report_and_skip_edge(
"KP",
test=trapi_creator,
test_case=kp_trapi_case,
test_report=results_bag.unit_test_report
)
else:
> await execute_trapi_lookup(
case=kp_trapi_case,
creator=trapi_creator,
rbag=results_bag,
test_report=results_bag.unit_test_report
)
tests/onehop/test_onehops.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
translator/trapi/__init__.py:227: in execute_trapi_lookup
trapi_request, output_element, output_node_binding = creator(case)
tests/onehop/util.py:132: in wrapper
result = fn(*args, **kwargs)
tests/onehop/util.py:245: in raise_subject_entity
parent_subject = ontology_kp.get_parent(subject, subject_cat, biolink_version=request['biolink_version'])
translator/sri/testing/util/ontology_kp.py:135: in get_parent
query_entity = convert_to_preferred(curie, preferred_prefixes)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
curie = 'FMA:72647', allowed_list = ['NLMID']
def convert_to_preferred(curie, allowed_list):
"""
:param curie
:param allowed_list
"""
j = {'curies': [curie]}
result = post(NODE_NORMALIZER_SERVER, j)
if not result:
return None
> new_ids = [v['identifier'] for v in result[curie]['equivalent_identifiers']]
E TypeError: 'NoneType' object is not subscriptable
translator/sri/testing/util/ontology_kp.py:36: TypeError
----------------------------------- Captured log setup -----------------------------------
DEBUG asyncio:selector_events.py:54 Using selector: KqueueSelector
----------------------------------- Captured log call ------------------------------------
DEBUG urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): nodenormalization-sri.renci.org:443
DEBUG urllib3.connectionpool:connectionpool.py:456 https://nodenormalization-sri.renci.org:443 "POST /get_normalized_nodes HTTP/1.1" 200 18
--------------------------------- Captured log teardown ----------------------------------
DEBUG asyncio:selector_events.py:54 Using selector: KqueueSelector
================================ short test summary info =================================
FAILED tests/onehop/test_onehops.py::test_trapi_kps[rtx-kg2#0-raise_subject_entity] - TypeError: 'NoneType' object is not subscriptable
When testing one edge, all the tests pass except raise_subject_entity
above. It has an issue with the curie FMA:72647
not being allowed and (I think) can't convert it to a preferred curie for some reason.
one thing to note is that many raise_subject_entity
tests might not pass currently because Plover still only does concept subclass reasoning for diseases and drugs, and not other node types. I'm going to be working on expanding that over the next week or two
Can we close out this issue?
The SRI team is requesting that the file https://raw.githubusercontent.com/RTXteam/RTX/production/code/ARAX/KnowledgeSources/RTX_KG2c_test_triples.json
not contain any biolink-to-biolink edges. I think we will probably want to include a filter when we generate this file, or post-filter it.