SeniorMars / tree-sitter-typst

A TreeSitter parser for the Typst File Format
MIT License
137 stars 7 forks source link

A tree-sitter parser for the typst file format

This language is soooo hard to parse… whitespace, parenthesizes for everything, and Unicode :(

DONE:


Outdated specification comes from: https://www.user.tu-berlin.de/laurmaedje/programmable-markup-language-for-typesetting.pdf

I'll be using the textmate grammar as inspiration: https://github.com/typst/typst/blob/main/tools/support/typst.tmLanguage.json

For myself, I'll paste it here:


Typst Grammar

Below is an approximate EBNF grammar for the Typst language that is based on our handwritten recursive descent parser. We follow these conventions:

– Production names are all lowercase.
– Text enclosed in single (') or double quotes (") defines a terminal.
– * for an arbitrary number of repetitions.
– + for at least one repetition.
– ? for zero or one repetitions.
– ! to negate a simple (character-class-like) production.
– . to match an arbitrary character.
– a - b to match anything that matches a but not b.
– unicode(Property) to match any character that has the given unicode property.

Note that comments and spaces are allowed almost everywhere within code constructs. For readability, this is omitted in the grammar. Moreover, the grammar omits the indentation rules for lists, as EBNF cannot handle context-sensitive constructs.

// Markup.
markup ::= markup-node*
markup-node ::=
space | nbsp | shy | endash | emdash | ellipsis | quote | 
strong | emph | raw | link | math | heading | list | enum | desc

// Markup nodes.
nbsp ::= '~'
shy ::= '-?'
endash ::= '--'
emdash = '---'
ellipsis ::= '...'
quote ::= "'" | '"'
strong ::= '*' markup '*'
raw ::= '`' (raw | .*) '`'
link ::= 'http' 's'? '://' (!space)*
math ::= ('$' .* '$') | ('$[' .* ']$')
heading ::= '='+ space markup
list ::= '-' space markup
enum ::= digit* '.' space markup
desc ::= '/' space markup ':' space markup