arithy / packcc

A parser generator for C
Other
349 stars 29 forks source link

Segfaulting parser #28

Closed dolik-rce closed 3 years ago

dolik-rce commented 3 years ago

Grammar:

%source {
static const char *dbg_str[] = { "Evaluating rule", "Matched rule", "Abandoning rule" };
#define PCC_DEBUG(event, rule, level, pos, buffer, length) \
    fprintf(stderr, "%*s%s %s @%d [%.*s]\n", level * 2, "", dbg_str[event], rule, pos, length, buffer)
}

file <- (a / _)+
a <- "A;"
_ <- [ \t\n]*

Input:

A;
A

Expected output:

Syntax error should be reported.

Debugger session:

(gdb) run < tmp.d/input.txt
Starting program: /home/h/prog/packcc/tests/tmp.d/parser < tmp.d/input.txt
Evaluating rule file @0 []
  Evaluating rule a @0 []
  Matched rule a @0 [A;]
  Evaluating rule a @2 []
  Abandoning rule a @2 []
  Evaluating rule _ @2 [
A]
  Matched rule _ @2 [
]
  Evaluating rule a @3 [A]
  Abandoning rule a @3 []
  Evaluating rule _ @3 [A
]
  Matched rule _ @3 []
Matched rule file @0 [A;
]
Evaluating rule file @0 [A
]
  Evaluating rule a @3 [A
]
  Abandoning rule a @3 []
  Evaluating rule _ @3 [A
]
  Matched rule _ @3 []
Matched rule file @0 [A

]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f46801 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7f46801 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
#1  0x00005555555571c0 in pcc_commit_buffer (ctx=0x55555555c2a0) at tmp.d/parser.c:820
#2  0x0000555555558581 in pcc_parse (ctx=0x55555555c2a0, ret=0x7fffffffe8ac) at tmp.d/parser.c:1128
#3  0x0000555555558623 in main (argc=1, argv=0x7fffffffe9b8) at main.c:17
(gdb) f 1
#1  0x00005555555571c0 in pcc_commit_buffer (ctx=0x55555555c2a0) at tmp.d/parser.c:820
820     memmove(ctx->buffer.buf, ctx->buffer.buf + ctx->pos, ctx->buffer.len - ctx->pos);
(gdb) p *ctx
$1 = {pos = 3, level = 0, buffer = {buf = 0x55555555c310 "A\n\nA\n", max = 256, len = 2}, lrtable = {buf = 0x55555555c420, max = 256, len = 4}, lrstack = {buf = 0x55555555cc30, max = 2, 
    len = 0}, auxil = 0x0}

Somehow, ctx->pos > ctx->buffer.len, which leads to segfault, because size_t is unsigned, so this overflows and it effectively tries to copy 18446744073709551615 bytes of memory, which is waaaay outside of the allocated memory.

masatake commented 3 years ago

How about this change ? https://github.com/universal-ctags/packcc/commit/37c1b8863550001441e0e7eb40c69c2ea6aae0fe

dolik-rce commented 3 years ago

Interestingly, this fails:

rule <- ("A;" / _)+
_ <- [ \n]*

while this correctly ends with syntax error:

rule <- ("A;" / [ \n]*)+

Semantically, both should be the same, so this is definitely a bug.

dolik-rce commented 3 years ago

@arithy: Is it intended behavior that some cached answers can be kept in pcc_lr_table after pcc_commit_buffer is called?

Assuming this grammar:

rule <- ("A;" / _)+
_ <- [ \n]*

I believe this is what happens:

  1. The first line is correctly parsed, storing some answers in lr_table (len == 4)
  2. Second line cannot be parsed, therefore the rule matching is ended.
  3. pcc_commit_buffer is called and the buffer is shofted by 3 bytes (because it matched "A;\n")
  4. Within pcc_commit_buffer lr_table is shifted by ctx->pos bytes, which is three. But since it was originally 4 entries long, so one remains.
  5. Then the parsing loop runs ppc_parse again, starting on the begging of second line.
  6. The code runs fine, until it gets stored answer from previous run, where the position points behind the end of the file (pos=3, but only two bytes remain to be parsed: "A\n").

So, it seems like either

It seems to me, that the last option makes most sense, since it is possible that during backtracking some matches might have been done in the further parts of buffer and it would be wasteful to just delete them. Is my reasoning correct?

I did modify the PackCC to test this hypothesis, by making it emit this code for the shifting:

static void pcc_lr_table__shift(pcc_auxil_t auxil, pcc_lr_table_t *table, size_t count) {
    size_t i;
    if (count > table->len) count = table->len;
    for (i = 0; i < count; i++) pcc_lr_table_entry__destroy(auxil, table->buf[i]);
    memmove(table->buf, table->buf + count, sizeof(pcc_lr_table_entry_t *) * (table->len - count));
    table->len -= count;
    for (i = 0; i < table->len; i++) {
        if (table->buf[i]) {
            table->buf[i]->memos.buf->answer->pos -= count;
        }
    }
}

It seems to fix this issue and works for all the tests but one. It seems that there was similar, but more subtle error in character_classed_1.d, that caused it to return incorrect value I missed when writing the test. But it did not crash, which is actually more misleading :slightly_smiling_face:

arithy commented 3 years ago

Your understanding is absolutely right. I missed the update of the position value in pcc_lr_answer_t::pos. Thanks a lot for pointing it out.