NCATSTranslator / reasoner-validator

Validation of Translator OpenAPI (TRAPI) messages both to TRAPI and Biolink Model standards. See https://ncatstranslator.github.io/reasoner-validator/
Other
2 stars 4 forks source link

Issue with dependencies #76

Closed vemonet closed 1 year ago

vemonet commented 1 year ago

Describe the bug We have a really simple test that just do a request to our API in production every day and run the reasoner-validator to check the whole thing is compliant.

Every other day the whole test fails even if we change nothing and it's always because of the reasoner-validator dependency tree

This issue has been raised a few times already in the past, and everytime it has been temporary solved, but like a snake whack-a-mole another dependency conflict is always surfacing a few days after a fix has been brought

This time it seems the issue is related to a dependency of oaklib:

___________ ERROR collecting tests/production/test_production_api.py ___________
tests/production/test_production_api.py:3: in <module>
    from reasoner_validator import TRAPIResponseValidator
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/reasoner_validator/__init__.py:3: in <module>
    from reasoner_validator.biolink import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/reasoner_validator/biolink/__init__.py:11: in <module>
    from bmt import Toolkit
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bmt/__init__.py:1: in <module>
    from bmt.toolkit import Toolkit
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bmt/toolkit.py:6: in <module>
    from oaklib.implementations import UbergraphImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/__init__.py:9: in <module>
    from oaklib.selector import get_adapter, get_implementation_from_shorthand  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/selector.py:15: in <module>
    from oaklib.implementations.funowl.funowl_implementation import FunOwlImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/__init__.py:13: in <module>
    from oaklib.implementations.cx.cx_implementation import CXImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/cx/cx_implementation.py:9: in <module>
    from oaklib.implementations.obograph.obograph_implementation import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/obograph/obograph_implementation.py:40: in <module>
    from oaklib.interfaces.differ_interface import DifferInterface
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/interfaces/differ_interface.py:25: in <module>
    from oaklib.utilities.kgcl_utilities import generate_change_id
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/utilities/kgcl_utilities.py:11: in <module>
    import kgcl_schema.grammar.parser as kgcl_parser
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/kgcl_schema/grammar/parser.py:9: in <module>
    from bioregistry import parse_iri, get_preferred_prefix, curie_to_str
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/__init__.py:5: in <module>
    from .collection_api import get_collection, get_context  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/collection_api.py:7: in <module>
    from .resource_manager import manager
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/resource_manager.py:41: in <module>
    from .schema import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/schema/__init__.py:5: in <module>
    from .struct import (  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/schema/struct.py:2054: in <module>
    class RegistryGovernance(BaseModel):
pydantic/main.py:197: in pydantic.main.ModelMetaclass.__new__
    ???
pydantic/fields.py:506: in pydantic.fields.ModelField.infer
    ???
pydantic/fields.py:436: in pydantic.fields.ModelField.__init__
    ???
pydantic/fields.py:552: in pydantic.fields.ModelField.prepare
    ???
pydantic/fields.py:668: in pydantic.fields.ModelField._type_analysis
    ???
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/typing.py:852: in __subclasscheck__
    return issubclass(cls, self.__origin__)
E   TypeError: issubclass() arg 1 must be a class

The test is here: https://github.com/MaastrichtU-IDS/knowledge-collaboratory/blob/main/backend/tests/production/test_production_api.py

And in the test environment only the following dependencies are installed (cf. dependencies and optional dependencies in https://github.com/MaastrichtU-IDS/knowledge-collaboratory/blob/main/backend/pyproject.toml#L27 )

    "python-multipart >=0.0.5",
    "requests >=2.23.0",
    "httpx >=0.21.1",
    "pydantic[dotenv] >=1.9",
    "fastapi >=0.68.1",
    "uvicorn >=0.15.0",
    "gunicorn >=20.0.4",
    "Authlib >=0.15.4",
    "itsdangerous >=2.0.1",
    "reasoner-pydantic >=2.2.3",
    "rdflib >=6.1.1",
    "SPARQLWrapper >=1.8.5",
    "pytest >=7.1.3,<8.0.0",
    "pytest-cov >=2.12.0,<4.0.0",
    "ruff >=0.0.219",
    "reasoner-validator >=3.1.4",

To Reproduce Steps to reproduce the behavior:

  1. Try to use reasoner-validator in a real project

Expected behavior Dependencies should be better defined, and better tested.

Tests have been defined in the tests/ folder, but they are currently not run by the GitHub Actions set in this repo (there are 1 action to publish without testing, and 1 to update the docs). Ideally tests should be run at every push to the repository (like the GitHub action to update the docs), and they should test for different version of python (3.9, 3.10, 3.11 at the moment)

Using the reasoner-validator is really important for us to make sure our output fits the ever changing TRAPI/BioLink specifications. Unfortunately the current state of the reasoner-validator forces us to spend hours per week to fix basic dependencies issues that should be handled upstream by the reasoner-validator, bmt, oaklib, etc. Those tools and libraries should have automated tests on every commit, and those tests should capture those kind of dependency failure by running in different environment.

Unfortunately at the moment our only long term solution is to do just like the reasoner-validator is doing right now: leaving the testing to someone else! (to the next person in the line, which are ARAs and ARX) and wait to be notified by those actors if our TRAPI payload is not compliant

edeutsch commented 1 year ago

The reasoner-validator was removed from the ARAX system a week or two ago for this very reason. We were unable to deliver a stable product with it included. it's a pity because our viewer is choking on a lot of bad TRAPI, but without validation, no one is aware.

I don't know how to solve this problem.

Maybe one possibility is to have a reasoner-validator-lite that does not have all these dependencies.

vemonet commented 1 year ago

Today we are getting a different error: AttributeError: 'Toolkit' object has no attribute 'get_infores_details'

    def validate_infores(self, context: str, edge_id: str, identifier: str):
        code_prefix: str = f"error.knowledge_graph.edge.sources.retrieval_source.{context}.infores"
        if not is_curie(identifier):
            self.report(
                code=f"{code_prefix}.not_curie",
                identifier=identifier,
                edge_id=edge_id
            )
        elif not identifier.startswith("infores:"):
            # not sure how absolute the need is for this to be an Infores. We'll be lenient for now?
            self.report(
                code=f"{code_prefix}.invalid",
                identifier=identifier,
                edge_id=edge_id
            )
>       elif not self.bmt.get_infores_details(identifier):
E       AttributeError: 'Toolkit' object has no attribute 'get_infores_details'

/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/reasoner_validator/biolink/__init__.py:698: AttributeError

I ran pytest on the master branch of the reasoner-validator repo and got the following errors:

FAILED tests/test_biolink_compliance_validation.py::test_validate_qualifiers[query12] - AssertionError: Unexpected messages seen {'errors': {'error.knowledge_graph.edge.qualifiers.qualifier.value.unresolved': {'UBERON:0001981': [{'edge_id': 'validate_qualifiers unit test', 'qualifier_type_id': 'biolink:anatomical_contex...
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query0] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query5] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query6] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query7] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query8] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query9] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query10] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query11] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query12] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query13] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query14] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query15] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query16] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query17] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query18] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query19] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query20] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query21] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query22] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query23] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query24] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_check_biolink_model_compliance_of_knowledge_graph[query25] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_latest_trapi_validate_sources[sources0-] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_biolink_compliance_validation.py::test_latest_trapi_validate_sources[sources8-error.knowledge_graph.edge.sources.retrieval_source.resource_id.infores.unknown] - AttributeError: 'Toolkit' object has no attribute 'get_infores_details'
FAILED tests/test_validate.py::test_query_latest_trapi_workflow_properties[1] - jsonschema.exceptions.ValidationError: {'id': 'lookup', 'runner_parameters': {'allowlist': ['infores:aragorn']}} is not valid under any of the given schemas
FAILED tests/test_validate.py::test_query_latest_trapi_workflow_properties[1.4] - jsonschema.exceptions.ValidationError: {'id': 'lookup', 'runner_parameters': {'allowlist': ['infores:aragorn']}} is not valid under any of the given schemas
FAILED tests/test_validate.py::test_query_latest_trapi_workflow_properties[1.4.0] - jsonschema.exceptions.ValidationError: {'id': 'lookup', 'runner_parameters': {'allowlist': ['infores:aragorn']}} is not valid under any of the given schemas
FAILED tests/test_validate.py::test_query_latest_trapi_workflow_properties[1.4.0-beta4] - jsonschema.exceptions.ValidationError: {'id': 'lookup', 'runner_parameters': {'allowlist': ['infores:aragorn']}} is not valid under any of the given schemas
FAILED tests/test_workflows.py::test_query_latest_trapi_workflow_properties[1-query0] - jsonschema.exceptions.ValidationError: [{'id': 'sort_results_score', 'parameters': {'ascending_or_descending': 'ascending'}, 'runner_parameters': {'allowlist': ['infores:aragorn']}}, {'id': 'lookup', 'runner_parameters': {'allowlist'...
FAILED tests/test_workflows.py::test_query_latest_trapi_workflow_properties[1.4-query2] - jsonschema.exceptions.ValidationError: [{'id': 'sort_results_score', 'parameters': {'ascending_or_descending': 'ascending'}, 'runner_parameters': {'allowlist': ['infores:aragorn']}}, {'id': 'lookup', 'runner_parameters': {'allowlist'...
FAILED tests/test_workflows.py::test_query_latest_trapi_workflow_properties[1.4.0-query4] - jsonschema.exceptions.ValidationError: [{'id': 'sort_results_score', 'parameters': {'ascending_or_descending': 'ascending'}, 'runner_parameters': {'allowlist': ['infores:aragorn']}}, {'id': 'lookup', 'runner_parameters': {'allowlist'...
FAILED tests/test_workflows.py::test_query_latest_trapi_workflow_properties[1.4.0-beta4-query6] - jsonschema.exceptions.ValidationError: [{'id': 'sort_results_score', 'parameters': {'ascending_or_descending': 'ascending'}, 'runner_parameters': {'allowlist': ['infores:aragorn']}}, {'id': 'lookup', 'runner_parameters': {'allowlist'...
================================================================================================= 33 failed, 307 passed in 68.27s (0:01:08) ==================================================================================================

It also faces AttributeError: 'Toolkit' object has no attribute 'get_infores_details' for a few tests

sierra-moxon commented 1 year ago

@vemonet - yes, we are removing features from BMT to make it a less heavy weight dependency per the architecture meeting yesterday. Though here we are importing KGX which is another layer of dependency that we should try to reduce (from my walk through the code for the first time yesterday, I see that we’re only really using KGX for prefix and curie resolution and it’s import of BMT). Please see my PR from yesterday evening.

RichardBruskiewich commented 1 year ago

Hi Vincent, your point about adding basic unit test Git actions to reasoner-validator is well taken. That said, as noted by others, we may have to start pinning more external dependencies to reliable specific versions in order to achieve stability, since most of the emergent errors in the project originate from newer releases, inadvertently pulled in, of external dependencies: several reasoner-validator dependencies are simply stated as a given release or newer

vemonet commented 1 year ago

Hi @RichardBruskiewich , pinning some dependencies might be a good idea for some dependencies we know are causing trouble

But I would be cautious about pinning too many dependencies, as it will probably make it harder to import the reasoner-validator in another project, because if the reasoner-validator requires really specific dependencies versions it will easily create conflicts with other dependencies that might not require the same specific version. It will also prevent the reasoner-validator to get the newest versions of its dependencies, which could bring security fixes and performance improvements. In the end it will require the maintainers of the package to manually upgrade the dependency versions to newer versions

There is no magical solution, for a package in active development, which relies on dependencies also in active development, there will always need to be some work on maintaining dependencies versions and upgrading to newer, especially for a package that needs to be imported in other projects

In my opinion the easiest approach to easily manage a package dependencies versions from a maintainers point of view is to limit as much as possible the amount of core dependencies, provide the widest version range where the package works, and have a good automated test suite to make sure the newest versions are not breaking anything. If we see the test suite is failing due to a new dependency version, or that a specific dependency always introduce breaking changes with new minor versions, we can put a upper limit on this dependency, request a change to it, or make the change ourselves and send a PR

Currently I don't think the reasoner-validator specifically needs more tests, there is already a good test suite (new tests for new features are always welcome though!), but it needs to be more integrated to the development workflow. We should not be able to publish a new version if the tests are not passing, and we should regularly check if the tests are still passing after the latest pushed changes

If we apply this methodology to more of the packages in the translator ecosystem (bmt, oaklib, kgx...), this should help catching the issues before they go live, and reduce the amount of supply chains issues, while still effortlessly getting upgrades from newer versions

sierra-moxon commented 1 year ago

@vemonet - I agree with everything you've said here. I do want to clarify though that bmt, KGX, oaklib, linkml-runtime, and even Biolink Model all have extensive test suites, running on every PR and every merge to the main branch. We restrict the main branch on those repos so that merging is not possible without passing tests actually. This repo is the only one in that chain that was not running tests on each PR and we've fixed that this week. KGX also imports BMT and we've not seen the same issues as are reported here (or we would not have released). Same with oaklib, etc. I think inconsistent dependency management even by downstream tools (e.g. using the >= operator in requirements files, etc. -- this is not a bad idea, its good to help with dependency churn in some cases, but is also going to allow daily changes to propagate into the repo. A locked dependency file that you can somewhat restrict might be safer), also contributes to these issues.