DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
574 stars 65 forks source link

Add more sparql11 rdf-tests #259

Closed Mec-iS closed 2 years ago

Mec-iS commented 2 years ago

Improve coverage of RDF-tests for SPARQL11 from official W3C repository

Mec-iS commented 2 years ago

@ceteri tests pass without problems in my local. I don't know why they are failing in the CI.

ceteri commented 2 years ago

I've seen odd conditions in how the GitHub CI sets up its environment. Since we test from container images, this should not be any problem – unless the environment for our Docker builds is somehow affected?

One thing that might help would be to print the expected values explicitly before the assert statements, so for test_add_ns then perhaps?

print(kg_test.get_ns_dict())
assert len(kg_test.get_ns_dict()) == 30
Mec-iS commented 2 years ago

print in my local is effectively 30, but the CI prints 11...

ceteri commented 2 years ago

This gets interesting. Here's what I'm seeing when I run this branch locally (adding a print for the dataframe content):

tests/test_namespaces.py:43: AssertionError
----------------------------------------------------------- Captured stdout call ------------------------------------------------------------
{'dct': 'http://purl.org/dc/terms/', 'owl': 'http://www.w3.org/2002/07/owl#', 'prov': 'http://www.w3.org/ns/prov#', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'rdfs': 'http://www.w3.org/2000/01/rdf-schema#', 'schema': 'http://schema.org/', 'sh': 'http://www.w3.org/ns/shacl#', 'skos': 'http://www.w3.org/2004/02/skos/core#', 'xsd': 'http://www.w3.org/2001/XMLSchema#', 'test1': 'http://usefulinc.com/ns/doap#', 'xml': 'http://www.w3.org/XML/1998/namespace'}
11
_____________________________________________________________ test_describe_ns ______________________________________________________________

kg_test_data = <kglab.kglab.KnowledgeGraph object at 0x7fde98cf8d60>

    def test_describe_ns(kg_test_data):
        """
    Coverage:

    * KnowledgeGraph.describe_ns()
        """
        df = kg_test_data.describe_ns()
        print(df)

>       assert len(df) == 29
E       assert 11 == 29
E        +  where 11 = len(    prefix                                    namespace\n0      dct                    http://purl.org/dc/terms/\n1     ...Schema#\n9     doap                http://usefulinc.com/ns/doap#\n10     xml         http://www.w3.org/XML/1998/namespace)

tests/test_namespaces.py:84: AssertionError
----------------------------------------------------------- Captured stdout call ------------------------------------------------------------
    prefix                                    namespace
0      dct                    http://purl.org/dc/terms/
1      owl               http://www.w3.org/2002/07/owl#
2     prov                   http://www.w3.org/ns/prov#
3      rdf  http://www.w3.org/1999/02/22-rdf-syntax-ns#
4     rdfs        http://www.w3.org/2000/01/rdf-schema#
5   schema                           http://schema.org/
6       sh                  http://www.w3.org/ns/shacl#
7     skos         http://www.w3.org/2004/02/skos/core#
8      xsd            http://www.w3.org/2001/XMLSchema#
9     doap                http://usefulinc.com/ns/doap#
10     xml         http://www.w3.org/XML/1998/namespace

So I'm getting the 11 count too.

This is with a completely fresh venv environment, initialized by the requirements.txt and requirements-dev.txt

Could it be possible that some other factor is causing more default namespaces to be added for RDFlib ?

As an alternative, how about this approach?

def test_add_ns(kg_test):
    """                                                                                                                                      
Coverage:                                                                                                                                    

* KnowledgeGraph.get_ns_dict()                                                                                                               
* KnowledgeGraph.get_ns                                                                                                                      
    """
    ns_dict = kg_test.get_ns_dict()
    obs_ns_keys = set(ns_dict.keys())
    print(obs_ns_keys)

    exp_ns_keys = set(["dct", "owl", "prov", "rdf", "rdfs", "schema", "sh", "skos", "xsd", "test1", "xml"])
    assert exp_ns_keys.issubset(obs_ns_keys)

    iri2  = "http://schema.org/"
    prefix2 = "test2"

    kg_test.add_ns(
        prefix = prefix2,
        iri = iri2
    )

    assert prefix2 in set(kg_test.get_ns_dict().keys())

    namespace = kg_test.get_ns("test2")
    # ic(namespace)                                                                                                                          

    assert type(namespace) == Namespace
    assert namespace == "http://schema.org/"
Mec-iS commented 2 years ago

@ceteri can merge

ceteri commented 2 years ago

Great, many thanks @Mec-iS !