Support for SPARQL CDTs (lists and maps as literals)

hartig commented 5 months ago

Version

5.1.0

Feature

The lack of built-in support for generic types of composite values such as lists and maps has been a long-standing issue for RDF and SPARQL. Together with a few other colleagues at the Amazon Neptune team we have developed an approach to represent lists and maps as literals in RDF data, and to extend SPARQL with features related to such literals. These extensions of SPARQL include:

an aggregation function to produce these composite values (FOLD),
functions to operate on these composite values in expressions, and
a new operator (UNFOLD) to unfold such composite values into their individual components.

We have created a complete formal specification of the approach and a comprehensive test suite for implementers, which can be found in our Github repo: https://github.com/awslabs/SPARQL-CDTs

Perhaps before you dive into the aforementioned specification, you may take a look at our short paper, in which we provide a slightly more extensive motivation for this work and a (very!) brief summary of the approach. After that, Section 2 of the specification provides a more detailed informal description of the different parts of the approach.

I am happy to answer any questions that you may have, both about the approach in general and about the idea to integrate the approach into Jena. Also, if you have issues with some parts of the specification, feel free to create an issue in the aforementioned Github repo. (And in case you are wondering, yes we are planning to file the approach as a SPARQL Enhancement Proposal (SEP) for the SPARQL-dev Community Group).

Are you interested in contributing a solution yourself?

Yes

rvesse commented 5 months ago

See PR #2501

afs commented 2 months ago

Please continue discussion on this contribution on this issue.

The PR has been merged to branch gh2518-cdt.

afs commented 2 months ago

@Olaf - in cleaning the ingested branch code, I found that main.jj does not build with javacc (I have version 7.0.12).

There are two uses ofUnfold and two declarations of Unfold.

This is not due to the ingestion/squash process onto the Jena branch - those links are to the PR #2501.

I don't know how that could have happened but the generated java code did not correspond to the JavaCC input. The problem is now fixed.

I've also added a Builder for OpUnfold so algebra using that operator can be written out and read in again. The command line tool qparse now works (it parses queries and also checks that they print out in a form that equals the input and also have the same algebra).

I've finished cleaning up the code (warnings, some white space things I noticed) for now.

Please test when you have some time.

I'll keep the branch in-step with the main branch. There are some parser changes for RDF 1.2 in the pipeline.

hartig commented 2 months ago

@afs

in cleaning the ingested branch code, I found that main.jj does not build with javacc (I have version 7.0.12).

There are two uses of Unfold and two declarations of Unfold.

I didn't notice myself. I guess that this was an artifact of my earlier attempt to rebase my branch, which was a several-hours project after more than a year of divergence between the branches.

The problem is now fixed.

Thanks!

Thanks also for the other changes (Builder for OpUnfold, cleaning of warnings, etc). I looked through all your commits and they are fine.

Please test when you have some time.

Done. Everything works as expected! That is, i) mvn clean package runs through without problems for that branch, ii) arq.rdftests runs our SPARQL CDTs test suite without issues, and iii) arq.arq can be used to run SPARQL queries that use our CDT-related features.

So, from my side, this can be merged now.

afs commented 2 months ago

Thanks.

I've finished aligning the code. I'll squash/tidy the additional commits and do a few timing tests to make sure nothing major has happened.

I'm busy most of this coming week and this isn't a trivial set of changes, so there will be a short delay.

afs commented 2 months ago

@hartig -- Thank you for the contribution!

It's now merged into the main branch and will be in the next Jena release. It is also in development snapshots from now on.

hartig commented 2 months ago

Great news! Thanks a lot @afs for your help on getting this contribution merged!!

apache / jena