DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
574 stars 65 forks source link

Improve SPARQL tests #248

Closed Mec-iS closed 2 years ago

Mec-iS commented 2 years ago

As per conversation.

https://github.com/oxigraph/oxrdflib/issues/8#issuecomment-1076838034

Mec-iS commented 2 years ago

oxigraph implements an automatic test suite that pulls examples from W3C and test them against its architecture.

The files involved are (some examples to be found here):

We can create a testsuite that use this files to load the data, run the query and compare the results.

Full details at RDF-tests.

ceteri commented 2 years ago

Excellent, that's great @Mec-iS !

Also, we can instrument for performance analysis when running these SPARQL test suite features:

For example, these get performed in https://github.com/DerwenAI/ray_tutorial/blob/main/pi.ipynb

This will become especially important when we're working with the NVIDIA-basd GPU optimizations for kglab

Tpt commented 2 years ago

Thank you! The SPARQL W3C test suite format seems indeed to be the best. It is implementation independant and a lot of systems implement it like rdflib, Jena, RDF4J, ruby-rdf, sparql-ex and Communica. So, all the efforts done here could be reused to test other systems.

About instrumentation, if we instrument from Python we would need to use tools able to instrument native code too. Oxigraph is implemented in Rust and only the rdflib wrapper is in Python. Oxigraph already uses LLVM AddressSantizer to fight memory errors and leaks and Criterion for speed benchmarking. I personally use the Clion profiler to do profiling but something that could be publicly shared would be much better.

Mec-iS commented 2 years ago

I was thinking writing our own manifest parser but it is better to re-use the rdflib one I suppose, even if it is not possible to access it as a library. Like it should be better if we could do from rdflib.test.manifest import RDFTest, read_manifest but I suppose we will need to copy the harness into the kglab tests.

Also I will probably copy the oxigraph-tests directory containing the Oxigraph manifest to run them against kglab SPARQL querying.

Mec-iS commented 2 years ago

253

Mec-iS commented 2 years ago

@Tpt I am running into this error when passing query results into the bindingsCompatible function:

TypeError: unsupported operand type(s) for -: 'RDFResult' and 'set'

It seems that the tests return sometimes return RDFResult that cannot be cast into a set. For example test bound/dawg-bound-query-001. Any hint? Thanks

Tpt commented 2 years ago

@Mec-iS I believe it's because the bindingsCompatible expect its arguments to be sets and where it is called the arguments are not cast into sets opposite to what rdflib does.

VladimirAlexiev commented 2 years ago

The folders basic and algebra in https://github.com/DerwenAI/kglab/tree/main/tests/rdf_tests/dat contain obsolete SPARQL 1.0 tests, see eg http://rawgit2.com/DerwenAI/kglab/main/tests/rdf_tests/dat/algebra/index.html. Please use the official W3C SPARQL tests.

Test runner: if you're willing to use PHP, there is https://github.com/BorderCloud/TFT that powers http://sparqlscore.com/

Mec-iS commented 2 years ago

@VladimirAlexiev thanks for your comment

Are all the tests in basic and algebra in the data-r2 deprecated? Should we use only the ones in sparql11 directory instead?

Mec-iS commented 2 years ago

after a closer look, there were already sparql11 tests running but the directory structure is flattened between sparql1 and sparql11 rdf-tests.

I have added more sparql11 tests #259

Mec-iS commented 2 years ago

moved to #250 and #251