Callidon / sparql-engine

🚂 A framework for building SPARQL query engines in Javascript/Typescript
https://callidon.github.io/sparql-engine
MIT License
99 stars 14 forks source link

Updates to use the latest versions of sparqljs and n3. #85

Open stuarthendren opened 8 months ago

stuarthendren commented 8 months ago

Also updates rdfjs libs to modules versions and rdf-string.

As a consequence, the following changes were made:

stuarthendren commented 8 months ago

This looks at #58 - there remains 2 tests failures in the SERVICE bind join tests. I currently can't get to the bottom of those but will continue to look at it.

In the mean time, I though I'd make the PR and see if you are open to these change in principal? It's not currently clear if the lib is still maintained? If there's no interest in this then I may release this separately as it could be a useful library for a number of project I'm working on.

Callidon commented 8 months ago

👋 Hi, thanks for the changes. I'm completely open to an upgrade to use the latest versions of these library. However, I don't have time to review such a massive PR. It started as a full supported project during my thesis, but I don't work on these topics anymore, so I can't allow work time on it 😞

So, feel free to advance and we will see. My main concerne will be how that it will a huge breaking change with how the lib previously works, so it will need a major release

JuniperChicago commented 8 months ago

@stuarthendren I appreciate your effort here as I attempted this massive update a few years ago and got 3/4 done before having to switch my attention elsewhere. Unlike you, I did not post my work-in-progress because I thought it too messy, but I appreciate that you did.

As for the benefit and future of this update, I think it would reinvigorate the use of sparql-engine and set the stage for more adaptation both in use of streaming libraries and adaptors for persistent storage. I really like the design thinking @Callidon used as a foundation of this project as it makes perfect sense even in future tense, but the triple definition has been an impediment to expanding its application.

@stuarthendren, I took a quick look at your tests and it seems the issue centers on the treatment of literals, and likely has nothing to do with the SERVICE portion of its application. I was rather surprised to find a dearth of tests dealing with literals and language tags, a hole that easily allows something to be missed. As it stands @stuarthendren, any use of object literals will fail in the current pull request... unless they are placed inside filters or binds. So...

PREFIX dblp-pers: <https://dblp.org/pers/m/>
PREFIX dblp-rdf: <https://dblp.uni-trier.de/rdf/schema-2017-04-18#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?article WHERE {
      ?s rdf:type dblp-rdf:Person .
      ?s dblp-rdf:primaryFullPersonName "Thomas Minier"@en .
      ?s dblp-rdf:authorOf ?article .
    }

will fail and...

    SELECT ?article WHERE {
      BIND("Thomas Minier"@en AS ?name)
      ?s rdf:type dblp-rdf:Person .
      ?s dblp-rdf:primaryFullPersonName ?name .
      ?s dblp-rdf:authorOf ?article .
    }

will succeed. Also...

      PREFIX dblp-pers: <https://dblp.org/pers/m/>
      PREFIX dblp-rdf: <https://dblp.uni-trier.de/rdf/schema-2017-04-18#>
      PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      SELECT * WHERE {
        ?s rdf:type dblp-rdf:Person .
        SERVICE <${GRAPH_A_IRI.value}> {
          ?s dblp-rdf:primaryFullPersonName "Thomas Minier"@en .
        }
      }

will fail and...

      PREFIX dblp-pers: <https://dblp.org/pers/m/>
      PREFIX dblp-rdf: <https://dblp.uni-trier.de/rdf/schema-2017-04-18#>
      PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      SELECT ?s ?article WHERE {
        ?s rdf:type dblp-rdf:Person .
        ?s dblp-rdf:authorOf ?article .
        SERVICE <${GRAPH_A_IRI.value}> {
          BIND("Thomas Minier"@en AS ?name)
          ?s dblp-rdf:primaryFullPersonName ?name .
        }
      }

will succeed.

Also, FILTER used with object literal queries will work both with plain sparql queries as well as with SERVICE.

Does this narrow down the issue?

stuarthendren commented 8 months ago

@JuniperChicago thanks for taking a look. I think that will help, it's probably around the toN3 and fromN3 usage. I'll take another look at it when I get the chance but might be a little while.

@Callidon I agree it would be a major release, but I'd also suggest make a beta release (or just branch) and make a few other changes first, like upgrading all the dependencies, and #62 to change from the custom Graph to the rdfjs Dataset, before making the major release.

Callidon commented 8 months ago

I agree it would be a major release, but I'd also suggest make a beta release (or just branch) and make a few other changes first, like upgrading all the dependencies, and https://github.com/Callidon/sparql-engine/issues/62 to change from the custom Graph to the rdfjs Dataset, before making the major release.

I agree with the beta, it could live under a beta tag until we are sure it can be released

stuarthendren commented 8 months ago

@JuniperChicago that did help me to find the issue and fix it. It was actually in the Graph implementation in the test code.

@Callidon You could create a beta branch and I can change the PR to that branch if you'd like to keep it there until it's ready for release.

I think the current code now functions correctly but needs some clean up - if you're happy for me to progress this (?), I'll clean up the code, maybe add a formatter, then look at some of the other things mentioned: update all libs, change Graph to Dataset (#62).

stuarthendren commented 8 months ago

Hi @Callidon,

I think this is ready to go on a beta branch or beta release. In addition to the main change of sparqljs upgrade and knock on changes to @rdfjs/types, I've taken the opportunity to add a formatter for consistency and changes the deprecated tslint for eslint.

I realize this might be too big a change to review, line by line but I've tried to fix the github actions so you can run the CI and hopefully see that the test and linting all pass. Then it could go to a beta branch and later beta release to allow for any issue to be found before major release.

I think it would make sense to change the Graph for Dataset before major release as this would allow simpler update and uptake. The examples also need updating, I've not done this now as they would need doing again after the Dataset change.

If you don't have the time or don't want to go down this road, please let me know, and I'll peruse a different route. Stuart

Callidon commented 8 months ago

Thanks or the changes. However, as said before, it's too much for me to review on my own. I'm not comfortable giving a go witohut a proper review, because we are taling about a major shift in the package, and while I have no doubt that you did a great job, there's a lot of points that need to be check (database optimization still works, performance is not impacted, API design is good, documentation is up to date, etc).

Do you have people motivated to help review it?

github-advanced-security[bot] commented 8 months ago

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

stuarthendren commented 8 months ago

I'm sorry, I don't have anyone who can review this. I'll leave this open for now but assume it's not going any further. I may release this separately when I have time.