ariatemplates / editor-backend

Server side processing for AT editors
0 stars 3 forks source link

Create a grammar generator (a step ahead of parser generation from grammar) #7

Open ymeine opened 11 years ago

ymeine commented 11 years ago

We can see that the grammars for AT and HTML are really close: since they are both markup based.

Moreover, whatever (I think) the language being defined by the grammar, the rules share the same pattern:

And some common patterns are used, to define list of some elements, etc.

It would be great to be able to define the rules in a more abstract level, with more properties, more semantics, and more flexibility. In JSON or whatever already existing standard format.

So instead of defining a block element in HTML like this:

block = open:opening ws0:__ elements:(elementList __)? close:closing? {
    var node = Node('block', line(), column(), offset(), text());

    node.add('openTag', open);

    node.addList('spaces.0', ws0);

    if (elements !== "") {
        node.addList('elements', elements[0]);
        node.addList('spaces.1', elements[1]);
    }

    if (close !== "") {
        node.add('closeTag', close);
    } else {
        node.flag('error');
        node.set('error', 'Block is not closed');
    }

    return node;
}

we would pass this information:

// A rule definition
{
    name: "block",
    nodeType: "block", // could default to the name of the rule
    type: 'block', // This is a block element (talking generically)
    // From this type we can set flags, and infer other stuff (see below the rule definition)
    definition: [ // an array is a sequence of elements
        {
            rule: "opening",
            key: "openTag", // could default to the name of the rule otherwise
            required: true // should be a default
        },
        {
            rule: "element",
            key: "elements",
            multiple: true, // this tells to use a specific rule used to parse a list of "element"
            separator: "__", // only if "multiple" is "true": the name of the rule used as a separator of items of the list. "__" could be a default, special rule used for that (to be defined anyway, or generated with some defaults)
            optional: true // equivalent to "required: true"
        },
        {
            rule: "closing",
            key: "closeTag",
            required: true, // should be a default
            // This is a special case, since we can see in the grammar that it has been made optional.
            // However the grammar generator should be able to generate permissive grammar, that it make anything actually optional if it can, but detect what was required, and automatically set errors
            errorMsg: "Block is not closed" // see comment right above. Could be built from the rule name, and the fact that this is the last rule
        }
    ]
}

We could even imagine some rule factories for common patterns. The above would be written:

{
    pattern: 'block', // set flag "block", and expects an opening, a closing, and a content between
    opening: {
        rule: "opening"
    },
    content: ...
    closing: {
        rule: "closing"
    }
}

// A special ("standard") rule definition
{
    type: "__" // requires rules "ws" (white spaces) and "comment" to be defined
}

// Other standards: 
[
    "ws",
    "eol",
    ...
]

This would handle nicely the conditional assignment in case an element is optional, also the numbering of white spaces, errors, permissive / partial parsers, ...

ymeine commented 11 years ago

Sublime Text's syntax definition implementation is worth having a look at! :wink:

Here is the draft of the reference.

And a real-life example in the case of Aria Templates.

ymeine commented 11 years ago

Ace's implementation too.

ymeine commented 11 years ago

Sublime Text's one is very similar to TextMate's one, and Ace got inspired by the latter.