biopragmatics / curies

🐸 Idiomatic conversion between URIs and compact URIs (CURIEs) in Python
https://curies.readthedocs.io
MIT License
21 stars 6 forks source link

Tests for remote federated services #52

Open cthoyt opened 1 year ago

cthoyt commented 1 year ago

I've excerpted this from #49 - this code tests public SPARQL interfaces are able to use the Bioregistry. I'm not sure if this belongs in this package, though

"""Tests for remote federated SPARQL."""

from textwrap import dedent

from curies.mapping_service import _handle_header
from tests.test_federated_sparql import FederationMixin

BIOREGISTRY_SPARQL_ENDPOINT = "http://bioregistry.io/sparql"

class TestPublicFederatedSPARQL(FederationMixin):
    """Test the identifier mapping service."""

    def setUp(self) -> None:
        """Set up the public federated SPARQL test case."""
        self.sparql = dedent(
            f"""\
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        SELECT DISTINCT ?o WHERE {{
            SERVICE <{BIOREGISTRY_SPARQL_ENDPOINT}> {{
                <http://purl.obolibrary.org/obo/CHEBI_24867> owl:sameAs ?o
            }}
        }}
        """.rstrip()
        )

    def query_endpoint(self, endpoint: str):
        """Query an endpoint."""
        self.assert_service_works(endpoint)

        accept = "application/sparql-results+json"
        resp = self.get(endpoint, self.sparql, accept=accept)
        self.assertEqual(
            200,
            resp.status_code,
            msg=f"SPARQL query failed at {endpoint}:\n\n{self.sparql}\n\nResponse:\n{resp.text}",
        )
        response_content_type = _handle_header(resp.headers["content-type"])
        self.assertEqual(accept, response_content_type, msg="Server sent incorrect content type")

        try:
            res = resp.json()
        except Exception:
            self.fail(msg=f"\n\nError running the federated query to {endpoint}:\n{resp.text}")
        self.assertGreater(
            len(res["results"]["bindings"]),
            0,
            msg=f"Federated query to {endpoint} gives no results",
        )
        self.assertIn(
            "https://bioregistry.io/chebi:24867",
            {binding["o"]["value"] for binding in res["results"]["bindings"]},
        )

    def test_public_federated_virtuoso(self):
        """Test sending a federated query to a public mapping service from Virtuoso."""
        self.query_endpoint("https://bio2rdf.org/sparql")

    def test_public_federated_blazegraph(self):
        """Test sending a federated query to a public mapping service from Blazegraph."""
        self.query_endpoint("http://kg-hub-rdf.berkeleybop.io/blazegraph/sparql")

    def test_public_federated_graphdb(self):
        """Test sending a federated query to a public mapping service from GraphDB."""
        self.query_endpoint("https://graphdb.dumontierlab.com/repositories/test")
vemonet commented 1 year ago

@cthoyt imo the tests done on the public endpoints should be run in a completely different workflow than all other tests that are running fully locally

I am not really familiar with tox, but ideally you should have 2 main commands to run the tests, which you can just put in separate folders (or any other mechanism depending on what's best with tox)

e.g. tox tests/integration for fully local tests on the current state of the source code, and tox tests/production for tests on the production API

And have 2 different GitHub actions workflow to run the test (so you see quickly if the integration tests are failing, or if the production tests are failing, without confusion). And this way you can easily define a different trigger for the production tests (e.g. CRON to run every week)

If you want to only implement production test for the curies service then it could makes sense to put them in this repository. But I would see it better fitting in the bioregistry repo, it makes more sense since you are testing the production deployment of the bioregistry in the end. And that will enable you to easily add production tests for other features than the mapping service, without having 2 different workflow to track for the bioregistry production deployment