Dada reference grammar - Githubissues

dada-lang / dada

I speak only of myself since I do not wish to convince, I have no right to drag others into my river, I oblige no one to follow me and everybody practices his art in his own way.

https://dada-lang.org

Apache License 2.0

444 stars 28 forks source link

Dada reference grammar #17

Open nikomatsakis opened 2 years ago

nikomatsakis commented 2 years ago

Our parser is hand-written and I'm ok with that, but it'd be nice to have a reference grammar, particularly one that enforces non-ambiguity.

I was thinking we could do it using some kind of parser generator tool that supports LL(k) or perhaps (better) GLL/GLR. Looking around at what I see on crates.io, parol looks like the best fit, and most actively maintained.

I definitely want something on crates.io so that we can make a dada-reference-grammar crate, run cargo test on it, and so forth, without having to install any external tools.

brson commented 2 years ago

I've started on this.

nikomatsakis commented 2 years ago

I was thinking we could do it using some kind of parser generator tool that supports LL(k) or perhaps (better) GLL/GLR. Looking around at what I see on crates.io, parol looks like the best fit, and most actively maintained.

Of these two, LL(k) would be better, as it enforces non-ambiguity.

brson commented 2 years ago

https://github.com/brson/dada/blob/grammar/components/dada-reference-grammar/src/grammar.par

It passes all but 28 of 107 tests. I'm struggling with it though since I don't understand how to write grammars well. So far parol says what I've written is LL2.

brson commented 2 years ago

parol has a neat feature that it can emit random valid syntax, which could be useful for testing.

brson commented 2 years ago

Now it's only 9 tests failing, all related to format strings (which I haven't tried to parse yet); and return without value and unit/tuples. The latter two both seem ambiguous to me.

Return without expression because if the expression is optional then it is not possible to have any expression-statements after a return statement

e.g.

fn foo() {
    return
    dead_code_here // can't decide what this is?
}

Unit/tuples, at least in statement position, are ambiguous with function calls per https://github.com/dada-lang/dada/issues/117

brson commented 2 years ago

I am though very happy for you to correct me and fix my grammar definition. I don't understand this subject well.

nikomatsakis commented 2 years ago

Nice! I'll make some time to check this out.

brson commented 2 years ago

The parser has to be generated manually right now and the command for that is here:

https://github.com/brson/dada/blob/grammar/components/dada-reference-grammar/src/lib.rs#L3

brson commented 2 years ago

I added two xtasks for grammar so you don't have to type the full commands:

cargo xtask grammar build
cargo xtask grammar decidable which shows the maximum lookahead

brson commented 2 years ago

33 tests failing now. I'll give it another look soon. (edit: looks to be mostly unary ops, that i haven't implemented yet).

brson commented 2 years ago

Well I fixed the problem with return-without-expr by making it only possible to appear as the final expression in a block.

nikomatsakis commented 2 years ago

@brson whoops, forgot about the return question. To be honest, I hadn't thought about it, but that rule (must be last in block) makes sense. Presumably that would apply to break and continue? I would probably have expected to use newline sensitivity -- i.e., there must be some token on the same line as return (could be an opening ().

brson commented 2 years ago

If we're going to be newline-sensitive for disambiguation elsewhere then we should probably do it for return/break/continue as well.

brson commented 2 years ago

Unasigning myself. Busy with other things lately.

vemoo commented 2 years ago

Regarding

fn foo() {
    return
    bar
}

the current parser parses it as return with value bar, which I think it makes sense, because you can do:

fn foo() {
    return,
    bar
}

if you want a return without value.

vemoo commented 2 years ago

I've been experimenting with tree-sitter here: https://github.com/vemoo/dada/blob/tree-sitter/tree-sitter-dada/grammar.js

I'm not sure if tree-sitter is a good idea for defining a reference grammar, but it wasn't to difficult to create the grammar and it's parsing most of the dada_tests files but with some differences.

I'm now taking a look at the work done by @brson and experimenting with it, but I've found that some changes to the grammar file make parol hang, but I'm not sure if its an issue in the grammar or a bug in parol.

vemoo commented 2 years ago

Thinking some more about this, it is a bit weird that newlines sometimes are interpreted as separators (fields, parameters, tuples, sometimes expressions) and other times as trivia.

Maybe newlines should not be automatically skipped in the parser, and should only be skipped where they are not ambiguous?

nikomatsakis commented 2 years ago

Maybe -- I was hoping to make newlines kind of magically "dwim" at all the right times, but I wouldn't be surprised if it doesn't always do what is expected.

vemoo commented 2 years ago

I've been trying to complete the parol grammar here: https://github.com/vemoo/dada/tree/grammar but I'm finding it much harder to work with than tree-sitter. I think it's because a lot of the changes I try cause the grammar to no longer be LL(k), so that may be a LL(k) vs GLR difference.

Another thing, I've noticed that there are a few issues that affect the grammar: #177, #156, #134, #117 Should we create a label for them? To know that they could affect the reference grammmar.

igotfr commented 1 year ago

https://github.com/brson/dada/blob/grammar/components/dada-reference-grammar/src/grammar.par

It passes all but 28 of 107 tests. I'm struggling with it though since I don't understand how to write grammars well. So far parol says what I've written is LL2.

sorry, but what program can I use to render or check .par files?

Answer: parol - https://crates.io/crates/parol