Named-graph support for `TriG` mediaType

Hello, love the work. I just found myself needing a codemirror turtle language processor so I can have a decent editor experience. Would love if you could introduce a concept of Graph in your AST. I thought I'd fork it, but the way I implemented it locally, it doesn't seem to be doing what I expect. I also would love to contribute if you're accepting open help.

The main pieces I thought were needed:

Statement {
  Directive | Triples '.' | Quads
}

Quads   {
  Graph '{' Triples '}' 
}

Graph {
  Iri | BlankNode
}

 ";" "^^" "." "_:" "," "[" "]" "@prefix" "@base" "BASE" "PREFIX" "{" "}"

^ Curly brackets added at the last line.

Here's the full grammar file.

@detectDelim

@top TurtleDoc {
  Statement*
}

Statement {
  Directive | Triples '.' | Quads
}

Directive {
  PrefixID | Base | SparqlPrefix | SparqlBase
}

PrefixID {
  '@prefix' Pname_ns Iriref '.'
}

Base {
  '@base' Iriref '.'
}

SparqlBase {
  "BASE" Iriref
}

SparqlPrefix {
  "PREFIX" Pname_ns Iriref
}

Triples {
  Subject ( PredicateObjectList | BlankNodePropertyList ) PredicateObjectList?  
}

Quads   {
  Graph '{' Triples '}' 
}

PredicateObjectList {
  Verb ObjectList (';' Verb ObjectList)*
}

ObjectList {
  Object (',' Object)*
}

Verb {
  Predicate | 'a'
}

Graph {
  Iri | BlankNode
}

Subject {
  Iri | BlankNode | Collection
}

Predicate {
  Iri
}

Object {
  Iri | BlankNode | Collection | BlankNodePropertyList | Literal
}

Literal {
  RdfLiteral | NumericLiteral | BooleanLiteral
}

BlankNodePropertyList {
  '[' PredicateObjectList ']'
}

Collection {
  '(' Object* ')'
}

NumericLiteral {
  Integer | Decimal | Double
}

RdfLiteral {
  String (Langtag | ('^^' Iri))?
}

BooleanLiteral {
  'true' | 'false'
}

String {
  String_literal_quote | String_literal_single_quote | String_literal_long_single_quote | String_literal_long_quote
}

Iri {
  Iriref | PrefixedName
}

PrefixedName {
  Pname_ln | Pname_ns
}

BlankNode {
  Blank_node_label | Anon
}

@skip {
  space  |  Comment
}

@tokens {
  space { @whitespace+ }

  Comment {
    '#'+  ![\n]* "\n"? ('#'+ ![\n]* "\n")*
  }

  Iriref {
    '<' (![<>"{}|^`\\\u{0000}-\u{0020}] | $[])* '>'
  }

  Pname_ns {
    Pn_prefix? ':'
  }

  Pname_ln {
    Pname_ns Pn_local
  }

  @precedence {
    Pname_ln, Pname_ns, "BASE", "PREFIX", "false", "true", String_literal_long_quote, String_literal_quote, "a", Double, Decimal, Integer, space
  }

  Blank_node_label {
    '_:' (Pn_chars_u | $[0-9]) ((Pn_chars | '.')* Pn_chars)?
  }

  Langtag {
    '@' $[a-zA-Z]+ ('-' $[a-zA-Z0-9]+)*
  }

  Integer {
    $[+-]? $[0-9]+
  }

  Decimal {
    $[+-]? $[0-9]* '.' $[0-9]+
  }

  Double {
    $[+-]? ($[0-9]+ '.' $[0-9]* Exponent | '.' $[0-9]+ Exponent | $[0-9]+ Exponent)
  }

  Exponent {
    $[eE] $[+-]? $[0-9]+
  }

  String_literal_quote {
    '"' ( ![\r\n\"] | Echar | Uchar )* '"'
  }

  String_literal_single_quote {
    '\'' ( ![\r\n\'] | Echar | Uchar )* '\''
  }

  String_literal_long_single_quote {
    "'''" (("'" | "''")? (!['] | Echar | Uchar))* "'''"
  }

  String_literal_long_quote {
    '"""' (('"' | '""')? (!["] | Echar | Uchar))* '"""'
  }

  Uchar {
    '\u' Hex Hex Hex Hex | '\U' Hex Hex Hex Hex Hex Hex Hex Hex
  }

  Echar {
    '\\' $[tbnrf\"\'\\]
  }

  Ws {
    $[\u{0020}] | $[\u{0009}] | $[\u{000D}] | $[\u{000A}]
  }

  Anon {
    '[' Ws* ']'
  }

  Pn_chars_base {
    $[A-Z] | $[a-z] | $[\u{00C0}-\u{00D6}] | $[\u{00D8}-\u{00F6}] | $[\u{00F8}-\u{02FF}] | $[\u{0370}-\u{037D}] | $[\u{037F}-\u{1FFF}] | $[\u{200C}-\u{200D}] | $[\u{2070}-\u{218F}] | $[\u{2C00}-\u{2FEF}] | $[\u{3001}-\u{D7FF}] | $[\u{F900}-\u{FDCF}] | $[\u{FDF0}-\u{FFFD}] | $[\u{10000}-\u{EFFFF}]
  }

  Pn_chars_u {
    Pn_chars_base | '_'
  }

  Pn_chars {
    Pn_chars_u | '-' | $[0-9] | $[\u{00B7}] | $[\u{0300}-\u{036F}] | $[\u{203F}-\u{2040}]
  }

  Pn_prefix {
    Pn_chars_base ((Pn_chars | '.')* Pn_chars)?
  }

  Pn_local {
    (Pn_chars_u | ':' | $[0-9] | Plx) ((Pn_chars | '.' | ':' | Plx)* (Pn_chars | ':' | Plx))?
  }

  Plx {
    Percent | Pn_local_esc
  }

  Percent {
    '%' Hex Hex
  }

  Hex {
    $[0-9] | $[A-F] | $[a-f]
  }

  Pn_local_esc {
    '\\' ('_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%')
  }

  ";" "^^" "." "_:" "," "[" "]" "@prefix" "@base" "BASE" "PREFIX" "{" "}"
}

Thanks, Eddie!

@eddie-thomas Hi Eddie, help is always welcome :) Some things to note:

Firstly, the best approach for implementation is probably using the lezer @dialect. Otherwise, people who only need Turtle might encounter syntax highlighting and possibly nodes belonging to TriG in their AST, which they didnt sign up for.

Secondly, the TriG grammar appears to be more extensive than the lines you shared. You can find the complete grammar here: TriG Grammar. Glancing over it real quick, it includes the same grammar and terminals as Turtle (duh its an extension of Turtle :p), but there are some differences (e.g., the @top declaration should be trigDoc, triples2, triplesOrGraph, etc.).

In reality, it's a more significant task than it initially seems. If you require a quick/temporary solution, I would recommend forking the repository and adding the missing grammar lines. However, if you're up for a challenge, you can attempt to implement TriG as a lezer @dialect and submit a PR.

Here are the high-level steps I suggest for making it work:

Create a list of the differences between the Turtle and TriG grammars (don't forget to include the Notes: section, as it contains important information regarding the grammar).
Determine how to implement TriG as a lezer @dialect.
Optional: Add highlighting rules and code folding.
Ensure that TriG does not function in Turtle mode.

Some tips:
Utilize the lezer playground.
Keep in mind that the Lezer documentation is limited. You may find helpful information on this forum.
This resource might be beneficial, but please be aware that the grammar it provides contains mistakes that I still need to fix: codemirror-lang-sparql.

aatauil / codemirror-lang-turtle

Named-graph support for `TriG` mediaType #1