RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

lots of RTX-KG2 test triples that are rejected by the SRI testing system #1911

Open saramsey opened 2 years ago

saramsey commented 2 years ago

The SRI team is requesting that the file https://raw.githubusercontent.com/RTXteam/RTX/production/code/ARAX/KnowledgeSources/RTX_KG2c_test_triples.json

not contain any biolink-to-biolink edges. I think we will probably want to include a filter when we generate this file, or post-filter it.

saramsey commented 2 years ago

OK, it appears that the remedy is to add a bit of logic to the module

create_csv_of_kp_predicate_triples.py
saramsey commented 2 years ago

I am testing create_csv_of_kp_predicate_triples.py, to see how long it takes to run.

saramsey commented 2 years ago

I am now testing the script create_csv_of_kp_predicate_triples.py on kg2-7-6c.rtx.ai

saramsey commented 2 years ago

Hi @amykglen do you know what the typical running time of this script is?

amykglen commented 2 years ago

no, but I think @acevedol does - she's been taking care of running that for new KG2 versions.

amykglen commented 2 years ago

also note that in line 116 of that script, we need to change v1.2 --> v1.3. I think we forgot to do this after we made TRAPI 1.3 production. https://github.com/RTXteam/RTX/blob/b2aea971a4834b43fe9fd6315bea4f23aecceb46/code/ARAX/KnowledgeSources/create_csv_of_kp_predicate_triples.py#L116

saramsey commented 2 years ago

OK will do.

saramsey commented 2 years ago

also note that in line 116 of that script, we need to change v1.2 --> v1.3. I think we forgot to do this after we made TRAPI 1.3 production.

https://github.com/RTXteam/RTX/blob/b2aea971a4834b43fe9fd6315bea4f23aecceb46/code/ARAX/KnowledgeSources/create_csv_of_kp_predicate_triples.py#L116

OK, I've coded this change on a local repo on kg2-7-6c.rtx.ai. Testing it now.

saramsey commented 2 years ago

Hi @amykglen should this line of code also be converted to v1.3?

https://github.com/RTXteam/RTX/blob/bec2a38bf24b4b35c27ab03ac0a05fcc604ab1ed/code/ARAX/KnowledgeSources/create_csv_of_kp_predicate_triples.py#L85

amykglen commented 2 years ago

that's probably a good idea! I think the main() function doesn't currently run the function that is in, but we may as well fix it.

saramsey commented 2 years ago

So, Richard B. has also advised (via Slack) that there is a second problem, namely, tuples in KG2c_test_triples.json that have either a subject category or object category that is too abstract, like biolink:NamedThing, or a mixin (which is now allowed). He wrote:

Screen Shot 2022-09-16 at 10 31 06 AM
saramsey commented 2 years ago

Hmm, the test triples file still contains nodes whose categories are biolink:PhysicalEssenceOrOccurrent which is evidently now a mixin. Need to filter out....

saramsey commented 2 years ago

OK, in the module create_csv_of_kp_predicate_triples.py, in the function get_kg2c_predicate_triple_examples, there is a rather complex Cypher query,

match (n)-[r]->(m) 
with distinct labels(n) as n1s, type(r) as rel, labels(m) as n2s 
unwind n1s as n1 unwind n2s as n2 
with distinct n1 as node1, rel as relationship, n2 as node2 
where node1 <> "Base" and node2 <> "Base" 
CALL apoc.cypher.run('match (n2:`' + node1 + '`)-[r2:`' + relationship + '`]->(m2:`' + node2 + '`) 
return n2.id as subject, m2.id as object limit 1', {}) 
YIELD value 
return node1, relationship, node2, value.subject as subject, value.object as object

which I think can (and should) be revised to this simpler query:

match (n)-[r]->(m) 
where not (n.id starts with 'biolink:') and not (m.id starts with 'biolink:') 
with distinct n.category as node1, r.predicate as relationship, m.category as node2, 
        min(n.id + '---' + m.id) as mk 
with *, split(mk, '---') as s 
return node1, relationship, node2, head(s) as subject, head(tail(s)) as object

which also contains a WHERE clause to omit triples for which the subject or object ID starts with the biolink: prefix (since such triples would be rejected by the SRI test harness).

saramsey commented 1 year ago

Information about how the URL (for the test triples JSON document ) is provided to the SRI testing harness, can be found in this issue: https://github.com/RTXteam/RTX/issues/1876

basically it is via the ARAX (or RTX-KG2 KP) SmartAPI registration.

saramsey commented 1 year ago

Now, when we run

pytest -s -vvv test_onehops.py --kp_id=rtx-kg2 --one \
           --x_maturity="development" --ara_id=None --teststyle=by_subject

we see this error:

TypeError: TRAPIResponseValidator.check_compliance_of_trapi_response() got an unexpected keyword argument 'message'

../../translator/trapi/__init__.py:275: TypeError
--------------------------------------------------------------- Captured log setup ----------------------------------------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
---------------------------------------------------------------- Captured log call ----------------------------------------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): raw.githubusercontent.com:443
DEBUG    urllib3.connectionpool:connectionpool.py:456 https://raw.githubusercontent.com:443 "GET /NCATSTranslator/ReasonerAPI/v1.3.0/TranslatorReasonerAPI.yaml HTTP/1.1" 200 10181
DEBUG    urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): arax.ncats.io:443
DEBUG    urllib3.connectionpool:connectionpool.py:456 https://arax.ncats.io:443 "POST /beta/api/rtxkg2/v1.3/query HTTP/1.1" 200 23838
-------------------------------------------------------------- Captured log teardown --------------------------------------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
============================================================= short test summary info =============================================================
FAILED test_onehops.py::test_trapi_kps[rtx-kg2#0-by_subject] - TypeError: TRAPIResponseValidator.check_compliance_of_trapi_response() got an unexpected keyword argument 'message'
edeutsch commented 1 year ago

As noted in the call, there was a time that some versions of the TRAPIResponseValidator required a message object as input (despite the name of the program/class). Richard has since fixed this in the very latest versions of the validator.

Passing a Response object (containing a message) when it was expecting a message object might lead to this behavior.

saramsey commented 1 year ago

OK, this was apparently due to a bug in SRI_testing/translator/trapi/__init__.py. I've sent a proposed bugfix to Richard and Lili.

saramsey commented 1 year ago

@edeutsch thank you, you diagnosed the issue exactly

acevedol commented 1 year ago

For the latest SRI_testing harness to run, we need to

acevedol commented 1 year ago

In order to only test the kg2c test triples file in kg2integration, we use pytest -s -vvv tests/onehop/test_onehops.py --kp_id=rtx-kg2 --x_maturity="development" --ara_id=None

acevedol commented 1 year ago

Failure on raise_subject_entity:

======================================== FAILURES ========================================
_____________________ test_trapi_kps[rtx-kg2#0-raise_subject_entity] _____________________

kp_trapi_case = {'biolink_version': '2.2.11', 'idx': 0, 'kp_id': 'infores:rtx-kg2', 'kp_source': 'infores:rtx-kg2', ...}
trapi_creator = <function raise_subject_entity at 0x111bfe0d0>
results_bag = ResultsBag:
{'location': 'https://raw.githubusercontent.com/RTXteam/RTX/kg2integration/code/ARAX/KnowledgeSources/RTX_...es:rtx-kg2', 'kp_source_type': 'primary'}, 'unit_test_report': <translator.trapi.UnitTestReport object at 0x122c11550>}

    @pytest.mark.asyncio
    async def test_trapi_kps(kp_trapi_case, trapi_creator, results_bag):
        """Generic Test for TRAPI KPs. The kp_trapi_case fixture is created in conftest.py by looking at KP test triples
        These get successively fed into test_trapi_kps.  This function is further parameterized by trapi_creator, which
        knows how to take an input edge and create some kind of TRAPI query from it.  For instance, by_subject removes
        the object, while raise_object_by_subject removes the object and replaces the object category with its
        biolink parent. This approach will need modification if there turn out to be particular elements we want
        to test for different creators.
        """
        results_bag.location = kp_trapi_case['ks_test_data_location']
        results_bag.case = kp_trapi_case
        results_bag.unit_test_report = UnitTestReport(
            test_case=kp_trapi_case,
            test_name=trapi_creator.__name__,
            trapi_version=kp_trapi_case['trapi_version'],
            biolink_version=kp_trapi_case['biolink_version'],
            sources={
                    "kp_source": kp_trapi_case['kp_source'],
                    "kp_source_type": kp_trapi_case['kp_source_type']
            }
        )

        if in_excluded_tests(test=trapi_creator, test_case=kp_trapi_case):
            _report_and_skip_edge(
                "KP",
                test=trapi_creator,
                test_case=kp_trapi_case,
                test_report=results_bag.unit_test_report,
                excluded_test=True
            )
        elif UnitTestReport.has_validation_errors("pre-validation", kp_trapi_case):
            _report_and_skip_edge(
                "KP",
                test=trapi_creator,
                test_case=kp_trapi_case,
                test_report=results_bag.unit_test_report
            )
        else:
>           await execute_trapi_lookup(
                case=kp_trapi_case,
                creator=trapi_creator,
                rbag=results_bag,
                test_report=results_bag.unit_test_report
            )

tests/onehop/test_onehops.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
translator/trapi/__init__.py:227: in execute_trapi_lookup
    trapi_request, output_element, output_node_binding = creator(case)
tests/onehop/util.py:132: in wrapper
    result = fn(*args, **kwargs)
tests/onehop/util.py:245: in raise_subject_entity
    parent_subject = ontology_kp.get_parent(subject, subject_cat, biolink_version=request['biolink_version'])
translator/sri/testing/util/ontology_kp.py:135: in get_parent
    query_entity = convert_to_preferred(curie, preferred_prefixes)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

curie = 'FMA:72647', allowed_list = ['NLMID']

    def convert_to_preferred(curie, allowed_list):
        """
        :param curie
        :param allowed_list
        """
        j = {'curies': [curie]}
        result = post(NODE_NORMALIZER_SERVER, j)
        if not result:
            return None
>       new_ids = [v['identifier'] for v in result[curie]['equivalent_identifiers']]
E       TypeError: 'NoneType' object is not subscriptable

translator/sri/testing/util/ontology_kp.py:36: TypeError
----------------------------------- Captured log setup -----------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
----------------------------------- Captured log call ------------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): nodenormalization-sri.renci.org:443
DEBUG    urllib3.connectionpool:connectionpool.py:456 https://nodenormalization-sri.renci.org:443 "POST /get_normalized_nodes HTTP/1.1" 200 18
--------------------------------- Captured log teardown ----------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
================================ short test summary info =================================
FAILED tests/onehop/test_onehops.py::test_trapi_kps[rtx-kg2#0-raise_subject_entity] - TypeError: 'NoneType' object is not subscriptable
acevedol commented 1 year ago

When testing one edge, all the tests pass except raise_subject_entity above. It has an issue with the curie FMA:72647 not being allowed and (I think) can't convert it to a preferred curie for some reason.

amykglen commented 1 year ago

one thing to note is that many raise_subject_entity tests might not pass currently because Plover still only does concept subclass reasoning for diseases and drugs, and not other node types. I'm going to be working on expanding that over the next week or two

saramsey commented 11 months ago

Can we close out this issue?