RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.12k stars 546 forks source link

SPARQL EBNF handing in Python #2008

Open nicholascar opened 2 years ago

nicholascar commented 2 years ago

SPARQL 1.1's grammar [1] is implemented using a W3C variant of EBNF [2] while most Parsing Expression Grammar (PEG) tools out there for Python understand standard EBNF (or ISO ABNF), e.g. Tatsu [3] or implement an EBNF-like Python grammar, e.g. Lark [4].

Has anyone converted SPARQL W3C EBNF to a form that can be used with any known Python packages?

I'm keen to have a standard grammar file for SPARQL (and ultimately Turtle etc) that RDFlib can use.

[1] https://www.w3.org/TR/rdf-sparql-query/#sparqlGrammar [2] https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form [3] https://pypi.org/project/TatSu/ [4] https://lark-parser.readthedocs.io/en/latest/grammar.html

niklasl commented 2 years ago

Quite some time ago I did a "one-off" Vim-script conversion from W3C EBNF to EBNF suitable for PEG.js. As I just threw regexps at the problem until I had something to continue manually from, I do not know how incomplete it is. At least it did the job for the TriG grammar.

It is straightforward to make a Python script with the same brute approach. I do not know if the resulting form is more "standard" (among the chaos of EBNF dialects). The differences may be simple enough to overcome by this crude method though (to avoid having to define an EBNF for the various EBNF dialects, in each dialect, and do a "scientifically correct" EBNF translator...).

ghost commented 2 years ago

Dunno if it helps but I'm in the process of experimenting with using Lark-driven parsers for RDFLib's handling of the W3C-published ebnf expressions. I've found the Lark IDE quite useful.

In general, I've found the terminals to be the most tedious to get right but, once nailed, one common set of terminals covers all the RDF format ebnf expressions.

Lark's treatment is useful at times, avoiding awkward terminals like:

BASE: ("B"|"b") ("A"|"a") ("S"|"s") ("E"|"e")
PREFIX: ("P"|"p") ("R"|"r") ("E"|"e") ("F"|"f") ("I"|"i") ("X"|"x")

by being able to use /BASE/i in the grammar rules.

The attached grammar and terminals (wip) (sparql-lark.txt) parses successfully when pasted into the IDE with this (arbitrarily-selected) query:

PREFIX  dc: <http://purl.org/dc/elements/1.1/>
PREFIX  : <http://example.org/book/>
SELECT  $title
WHERE   { :book1  dc:title  $title }
nicholascar commented 2 years ago

Thanks @niklasl & @gjhiggins for the responses! I will try and spend a bit of time on a few existing RDFLib issues in the next fortnight and, if I can clear out my personal backlog, I'll then try for a SPARQL improvement push aiming to: