Leandros / PackCC

PackCC is a packrat parser generator for C.
https://leandros.github.io/PackCC
MIT License
22 stars 5 forks source link

Failure to backtrack; trivial `?` expression fails #2

Open snej opened 5 years ago

snej commented 5 years ago

I've written a grammar but it's failing even on trivial inputs. I've boiled the grammar down to the following simple test case:

%prefix "test"

property <-
    ( IDENTIFIER '.' )? IDENTIFIER    { printf("PROPERTY: %s\n", $0);}
IDENTIFIER <-
    [a-zA-Z_] [a-zA-Z_0-9]*

%%
int main() {
    test_context_t *ctx = test_create(NULL);
    return test_parse(ctx, NULL);
}

This is able to parse input foo.bar but fails on foo:

$ packcc test.packcc && cc test.c -o test && echo "foo.bar" | ./test
PROPERTY: foo.bar
$ packcc test.packcc && cc test.c -o test && echo "foo" | ./test
Syntax error

It looks as though the parser is unable to recover from the failure to match the ( IDENTIFIER '.' ) rule. I suspect it isn't backtracking in the input, so when it hits the second IDENTIFIER there's nothing left to parse.

Here's evidence in favor of that. If I change the property rule to this:

property <-
    ( IDENTIFIER '.' )? ' ' IDENTIFIER

the parser will successfully parse the input foo bar, which is incorrect. So it appears that it consumed the foo, failed to match a ., then went on without backtracking and matched the space and bar.

This seems like a really elementary failure. Is this software considered stable or is it just an experiment? (I'm not trying to be sarcastic. I've put experimental stuff up on Github, there's nothing wrong with that. I just try to label it as such. I'm trying to use PackCC for a work project and don't want to waste more time trying to debug it if it's not ready...)

snej commented 5 years ago

Here's the evaluation function for the property rule

static pcc_thunk_chunk_t *pcc_evaluate_rule_property(test_context_t *ctx) {
    pcc_thunk_chunk_t *chunk = pcc_thunk_chunk__create(ctx->auxil);
    chunk->pos = ctx->pos;
    pcc_value_table__resize(ctx->auxil, &chunk->values, 0);
    pcc_capture_table__resize(ctx->auxil, &chunk->capts, 0);
    if (!pcc_apply_rule(ctx, pcc_evaluate_rule_IDENTIFIER, &chunk->thunks, NULL)) goto L0001;
    if (
        pcc_refill_buffer(ctx, 1) < 1 ||
        ((ctx->buffer.buf + ctx->pos)[0]) != '.'
    ) goto L0001;        // NOTE
    ctx->pos++;
L0001:;
    if (
        pcc_refill_buffer(ctx, 1) < 1 ||
        ((ctx->buffer.buf + ctx->pos)[0]) != ' '
    ) goto L0000;
    ctx->pos++;
    if (!pcc_apply_rule(ctx, pcc_evaluate_rule_IDENTIFIER, &chunk->thunks, NULL)) goto L0000;
    {
        pcc_thunk_t *thunk = pcc_thunk__create_leaf(ctx->auxil, (pcc_action_t)pcc_action_property_0, 0, 0);
        thunk->data.leaf.capt0.range.start = chunk->pos;
        thunk->data.leaf.capt0.range.end = ctx->pos;
        pcc_thunk_array__add(ctx->auxil, &chunk->thunks, thunk);
    }
    return chunk;
L0000:;
    pcc_thunk_chunk__destroy(ctx->auxil, chunk);
    return NULL;
}

See the line where I've added NOTE: If it fails to match the . token, the only effect is that it doesn't advance the input scanner past it. It doesn't rewind the input to make the initial IDENTIFIER available again.

stevefan1999-personal commented 5 years ago

Actually that was an error on handling new line, your echo is feeding that '\n' as well so it saw 10/0xA when I put a breakpoint on c = ctx->buffer.buf[ctx->pos];... and I'm contented this must be the naughty boy.