aatauil / codemirror-lang-turtle

TURTLE grammar for Lezer parser
MIT License
2 stars 1 forks source link

Named-graph support for `TriG` mediaType #1

Open eddie-thomas opened 1 year ago

eddie-thomas commented 1 year ago

Hello, love the work. I just found myself needing a codemirror turtle language processor so I can have a decent editor experience. Would love if you could introduce a concept of Graph in your AST. I thought I'd fork it, but the way I implemented it locally, it doesn't seem to be doing what I expect. I also would love to contribute if you're accepting open help.

The main pieces I thought were needed:

Statement {
  Directive | Triples '.' | Quads
}

Quads   {
  Graph '{' Triples '}' 
}

Graph {
  Iri | BlankNode
}

 ";" "^^" "." "_:" "," "[" "]" "@prefix" "@base" "BASE" "PREFIX" "{" "}"

^ Curly brackets added at the last line.

Here's the full grammar file.

@detectDelim

@top TurtleDoc {
  Statement*
}

Statement {
  Directive | Triples '.' | Quads
}

Directive {
  PrefixID | Base | SparqlPrefix | SparqlBase
}

PrefixID {
  '@prefix' Pname_ns Iriref '.'
}

Base {
  '@base' Iriref '.'
}

SparqlBase {
  "BASE" Iriref
}

SparqlPrefix {
  "PREFIX" Pname_ns Iriref
}

Triples {
  Subject ( PredicateObjectList | BlankNodePropertyList ) PredicateObjectList?  
}

Quads   {
  Graph '{' Triples '}' 
}

PredicateObjectList {
  Verb ObjectList (';' Verb ObjectList)*
}

ObjectList {
  Object (',' Object)*
}

Verb {
  Predicate | 'a'
}

Graph {
  Iri | BlankNode
}

Subject {
  Iri | BlankNode | Collection
}

Predicate {
  Iri
}

Object {
  Iri | BlankNode | Collection | BlankNodePropertyList | Literal
}

Literal {
  RdfLiteral | NumericLiteral | BooleanLiteral
}

BlankNodePropertyList {
  '[' PredicateObjectList ']'
}

Collection {
  '(' Object* ')'
}

NumericLiteral {
  Integer | Decimal | Double
}

RdfLiteral {
  String (Langtag | ('^^' Iri))?
}

BooleanLiteral {
  'true' | 'false'
}

String {
  String_literal_quote | String_literal_single_quote | String_literal_long_single_quote | String_literal_long_quote
}

Iri {
  Iriref | PrefixedName
}

PrefixedName {
  Pname_ln | Pname_ns
}

BlankNode {
  Blank_node_label | Anon
}

@skip {
  space  |  Comment
}

@tokens {
  space { @whitespace+ }

  Comment {
    '#'+  ![\n]* "\n"? ('#'+ ![\n]* "\n")*
  }

  Iriref {
    '<' (![<>"{}|^`\\\u{0000}-\u{0020}] | $[])* '>'
  }

  Pname_ns {
    Pn_prefix? ':'
  }

  Pname_ln {
    Pname_ns Pn_local
  }

  @precedence {
    Pname_ln, Pname_ns, "BASE", "PREFIX", "false", "true", String_literal_long_quote, String_literal_quote, "a", Double, Decimal, Integer, space
  }

  Blank_node_label {
    '_:' (Pn_chars_u | $[0-9]) ((Pn_chars | '.')* Pn_chars)?
  }

  Langtag {
    '@' $[a-zA-Z]+ ('-' $[a-zA-Z0-9]+)*
  }

  Integer {
    $[+-]? $[0-9]+
  }

  Decimal {
    $[+-]? $[0-9]* '.' $[0-9]+
  }

  Double {
    $[+-]? ($[0-9]+ '.' $[0-9]* Exponent | '.' $[0-9]+ Exponent | $[0-9]+ Exponent)
  }

  Exponent {
    $[eE] $[+-]? $[0-9]+
  }

  String_literal_quote {
    '"' ( ![\r\n\"] | Echar | Uchar )* '"'
  }

  String_literal_single_quote {
    '\'' ( ![\r\n\'] | Echar | Uchar )* '\''
  }

  String_literal_long_single_quote {
    "'''" (("'" | "''")? (!['] | Echar | Uchar))* "'''"
  }

  String_literal_long_quote {
    '"""' (('"' | '""')? (!["] | Echar | Uchar))* '"""'
  }

  Uchar {
    '\u' Hex Hex Hex Hex | '\U' Hex Hex Hex Hex Hex Hex Hex Hex
  }

  Echar {
    '\\' $[tbnrf\"\'\\]
  }

  Ws {
    $[\u{0020}] | $[\u{0009}] | $[\u{000D}] | $[\u{000A}]
  }

  Anon {
    '[' Ws* ']'
  }

  Pn_chars_base {
    $[A-Z] | $[a-z] | $[\u{00C0}-\u{00D6}] | $[\u{00D8}-\u{00F6}] | $[\u{00F8}-\u{02FF}] | $[\u{0370}-\u{037D}] | $[\u{037F}-\u{1FFF}] | $[\u{200C}-\u{200D}] | $[\u{2070}-\u{218F}] | $[\u{2C00}-\u{2FEF}] | $[\u{3001}-\u{D7FF}] | $[\u{F900}-\u{FDCF}] | $[\u{FDF0}-\u{FFFD}] | $[\u{10000}-\u{EFFFF}]
  }

  Pn_chars_u {
    Pn_chars_base | '_'
  }

  Pn_chars {
    Pn_chars_u | '-' | $[0-9] | $[\u{00B7}] | $[\u{0300}-\u{036F}] | $[\u{203F}-\u{2040}]
  }

  Pn_prefix {
    Pn_chars_base ((Pn_chars | '.')* Pn_chars)?
  }

  Pn_local {
    (Pn_chars_u | ':' | $[0-9] | Plx) ((Pn_chars | '.' | ':' | Plx)* (Pn_chars | ':' | Plx))?
  }

  Plx {
    Percent | Pn_local_esc
  }

  Percent {
    '%' Hex Hex
  }

  Hex {
    $[0-9] | $[A-F] | $[a-f]
  }

  Pn_local_esc {
    '\\' ('_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%')
  }

  ";" "^^" "." "_:" "," "[" "]" "@prefix" "@base" "BASE" "PREFIX" "{" "}"
}

Thanks, Eddie!

aatauil commented 1 year ago

@eddie-thomas Hi Eddie, help is always welcome :) Some things to note:

Firstly, the best approach for implementation is probably using the lezer @dialect. Otherwise, people who only need Turtle might encounter syntax highlighting and possibly nodes belonging to TriG in their AST, which they didnt sign up for.

Secondly, the TriG grammar appears to be more extensive than the lines you shared. You can find the complete grammar here: TriG Grammar. Glancing over it real quick, it includes the same grammar and terminals as Turtle (duh its an extension of Turtle :p), but there are some differences (e.g., the @top declaration should be trigDoc, triples2, triplesOrGraph, etc.).

In reality, it's a more significant task than it initially seems. If you require a quick/temporary solution, I would recommend forking the repository and adding the missing grammar lines. However, if you're up for a challenge, you can attempt to implement TriG as a lezer @dialect and submit a PR.

Here are the high-level steps I suggest for making it work:

eddie-thomas commented 1 year ago

@aatauil Nice, thanks for the comprehensive help! I'll probably stick to the shorter route, I'll fork and maybe get serious about it.