Switch to RDF/JS representation

iddan commented 5 years ago

Currently RDFTerms (Literals, IRIs and prefixed IRIs) are represented as strings. I suggest to use the RDF/JSON representation for RDF values. Also, prefixed IRIs should be distinguished from IRIs.

import { Parser } from 'sparqljs';
const parser = new Parser();
parser.parse(`
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE {
    ?person foaf:name ?name .
    ?person foaf:mbox ?email .
}
`)

Should return

{
  "queryType": "SELECT",
  "variables": [
    "*"
  ],
  "where": [
    {
      "type": "bgp",
      "triples": [
        {
          "subject": {
              "type": "variable",
              "identifier": "person"
          },
          "predicate": {
              "type": "prefixed-iri",
              "prefix": "foaf",
              "value": "name"
          },
          "object": {
              "type": "variable",
              "identifier": "name"
          }
        },
        {
          "subject": {
              "type": "variable",
              "identifier": "person"
          },
          "predicate": {
              "type": "iri",
              "value": "http://xmlns.com/foaf/0.1/mbox"
          },
          "object": {
              "type": "variable",
              "identifier": "email"
          }
        }
      ]
    }
  ],
  "type": "query",
  "prefixes": {
    "foaf": "http://xmlns.com/foaf/0.1/"
  }
}

We'd be happy to offer a PR

RubenVerborgh commented 5 years ago

Thanks for volunteering.

I would propose the RDF/JS interface: http://rdf.js.org

What you could do is pass in a factory to the parser constructor (RDF/JS has this). That way, you can have your favorite JSON output, and others can have their own.

Essentially, the needed changes should just be passing in the factory and calling its methods to create terms and triples.

Good plan? If so, I can give you some suggestions on how to get started.

Ruben

iddan commented 5 years ago

Amazing one! Please do

iddan commented 5 years ago

Something I miss in it is an explicit representation for a Prefixed URI. Do you know if it includes it?

RubenVerborgh commented 5 years ago

It does not, becaude that is only syntactical, not in RDF terms. But I have a solution if you need it.

iddan commented 5 years ago

I think we should use the RDF/JSON as you suggested and additionally define a prefixed URI and variable types. What do you think?

akuckartz commented 5 years ago

JSON-LD ?

RubenVerborgh commented 5 years ago

I think we should use the RDF/JSON as you suggested and additionally define a prefixed URI and variable types.

RDF/JS does have variables in there already.

It purposely does not have a prefixed URI, because that is not a different term type. That said, here's my solution for this:

The DataFactory interface has a method NamedNode namedNode(string value)
Call this method with factory.namedNode('http://xmlns.com/foaf/0.1/knows', { prefix: 'foaf', localName: 'knows' })
RDF/JS-compliant implementations ignore the second argument
Your implementation can use the second argument as it pleases

RubenVerborgh commented 5 years ago

JSON-LD ?

The main function of the parse tree is to use it within JavaScript, not exchanging it across implementations. For exchange, I'd recommend the SPARQL query itself.

RubenVerborgh commented 5 years ago

Proposed plan of implementation (AKA how I would do it, but your opinion might differ):

[ ] accept a { factory: … } options object to the SparqlParser constructor
[ ] make factory default to the N3.js data factory
[ ] in /test/, create an implementation of the factory that returns the current object
- this avoids that you have to change tests themselves, only their setup
- can easily be implemented by adding a wrapper around the N3.js factory: namedNode = value => n3.DataFactory.namedNode(value).id
[ ] wire up the tests to use the test factory
- tests should keep on running
[ ] replace all code to create terms and triples by calls to N3.js
- tests should keep on running

RubenVerborgh commented 5 years ago

FYI Linked to this in the README: 836017e14f75f155aa668aaf41c030128d4b9450

namedgraph commented 5 years ago

How about using SPIN SPARQL syntax as RDF representation? https://www.w3.org/Submission/spin-sparql/

iddan commented 5 years ago

Because this is a JavaScript library it makes the most sense to use a spec made for JavaScript representation of RDF. Also, the main goal is to be able to easily distinguish between term types without further string parsing.

iddan commented 5 years ago

@RubenVerborgh does the change require changing the JISON syntax file as well?

namedgraph commented 5 years ago

You'll still need a vocabulary for the JSON representation. Why invent the bike? Stand on the shoulders of giants, and all that... You could use JSON-LD to write SPIN syntax -- would that achieve your objective?

RubenVerborgh commented 5 years ago

How about using SPIN SPARQL syntax as RDF representation?

We’re not looking for an RDF representation, but rather for a JavaScript in-memory representation.

RubenVerborgh commented 5 years ago

does the change require changing the JISON syntax file as well?

Yes, but only to its JavaScript parts that generate terms.

iddan commented 5 years ago

I'd need help with that. Unfortunately, I have never written something like BISON or JISON

RubenVerborgh commented 5 years ago

It's pretty easy, as all the Jison stuff has been written already and you don't really have to touch it for this. All you need to alter is the JavaScript embedded within the Jison. For example:

      $$ = resolveIRI(expansion + $1.substr(namePos + 1));

will become something like

      var uriString = resolveIRI(expansion + $1.substr(namePos + 1));
      $$ = factory.namedNode(uriString);

namedgraph commented 5 years ago

How big of a task would this be? Is it only the s/p/o that need to be changed or more than that? If someone could summarize that, I could take a crack at it (not promising anything :))

RubenVerborgh commented 5 years ago

Just the s/p/o; summary right above 🙂 Thanks for taking a stab at this.

simonvbrae commented 5 years ago

@RubenVerborgh The way it is right now, the array of variables in a JSON representation sometimes contains a string ("*") and sometimes an array of TermTypes. @rubensworks would like to see this changed and has suggested to make it so "variables" is always a list of strings. That way users don't have to check what exactly it contains. Personally I think that that would be too much work in our codebase for the reward, and it might complicate code in other places like SPARQLAlgebra where elements of the variables array sometimes have to contain other attributes like an expression.

What do you think? Is this worth it? and what solution do you recommend?

RubenVerborgh commented 5 years ago

Hmm, interesting case. @rubensworks is there something to be said for a superclass of Term (SparqlTerm?) of which WildcardTerm is an instance? I’d like to see consistency in our use of Term; not sometimes strings.

rubensworks commented 5 years ago

I think if we go down the route of creating a new Term type, we may run into issues with packages downstream, as these will not know how to handle this special Term type. But it should definitely be possible to do though, but this should be documented in detail then.

Simply making all values of the variables array to be strings would not require special handling and documentation, as it will be immediately clear to users that they are not working with Terms anymore.

In any case, there is no one clean solution I'm afraid, other than changing the RDFJS spec, which would also not be very clean.

RubenVerborgh commented 5 years ago

Slight difference perhaps: I wrote superclass of Term. It could still implement all Term methods and properties (mainly equals to false), but would (strictly) not be a Term.

rubensworks commented 5 years ago

Sure, we could try that. Perhaps @joachimvh could share his insights on the usability of the superclass approach within SPARQLAlgebra.js.

namedgraph commented 5 years ago

Keep in mind that changes will also propagate to TypeScript bindings: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/sparqljs/index.d.ts

Type structure changes will break code.

RubenVerborgh commented 5 years ago

Hopefully TypeScript defs are versioned? Will be a semver major in any case.

namedgraph commented 5 years ago

I am not sure actually. They're simply on a Git repo. From this it looks like versioning is not supported: https://github.com/DefinitelyTyped/DefinitelyTyped/issues/7719

Maybe @AlexeyMz could help, he's the author of the typings. Currently they say:

// Type definitions for sparqljs 2.1

AlexeyMz commented 5 years ago

@namedgraph DefinitelyTyped covers this situation in the README:

If you intend to continue updating the older version of a library's type declarations, you may create a new subfolder (e.g. /v2/) named for the current (soon to be "old") version, and copy existing files from the current version to it.

https://github.com/DefinitelyTyped/DefinitelyTyped#if-a-library-is-updated-to-a-new-major-version-with-breaking-changes-how-should-i-update-its-type-declaration-package

So we should be fine with moving current typings into types/sparqljs/v2/ and put updated typings for sparqljs@3.0.

AlexeyMz commented 5 years ago

The way it is right now, the array of variables in a JSON representation sometimes contains a string ("*") and sometimes an array of TermTypes. @rubensworks would like to see this changed and has suggested to make it so "variables" is always a list of strings. That way users don't have to check what exactly it contains.

We extensively use Sparql.js in our projects and usually you don't want to treat *-projection the same way as an array of variables (where each one is a term or an expression), e.g. you can't append another term to variables if it's a *. So I think it make sense to treat it separately like in the grammar:

https://www.w3.org/TR/sparql11-query/#rSelectClause

iddan commented 5 years ago

This is very true. The tree should reflect the query

joachimvh commented 5 years ago

Perhaps @joachimvh could share his insights on the usability of the superclass approach within SPARQLAlgebra.js.

SPARQLAlgebra will need rewriting after this update anyway. But in the conversion I immediately convert '*' to the correct variables so I won't be using much of this new term anyway.

RubenVerborgh commented 5 years ago

Released as 3.0.0-beta.0 via #80.

jacoscaz commented 4 years ago

Apologies for resurrecting an old issue but it seems to be the most appropriate place for me to point out what follows.

It looks like the typings at https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/sparqljs/index.d.ts have not been updated after the transition to RDF/JS. I double-checked with @rubensworks on gitter.im to make sure I wasn't missing anything and this does seem to be the case.

Has any work ever been put on this?

RubenVerborgh commented 4 years ago

@jacoscaz Not yet, tracking in https://github.com/RubenVerborgh/SPARQL.js/issues/98

RubenVerborgh / SPARQL.js

Switch to RDF/JS representation #73