arithy / packcc

A parser generator for C
Other
347 stars 28 forks source link

Uninitialized variables #56

Closed dolik-rce closed 2 years ago

dolik-rce commented 2 years ago

Hello @arithy,

I have tested my application that uses PackCC generated parser with valgrind and I have noticed, that it reports conditional jumps depending on uninitialised values. Here is a simplified grammar to reproduce:

%value "int"

%source {
#include <stdio.h>
}

integer <- u:unary? d:digit {
    if (u) {
        printf("RESULT: %d\n", u * d);
    } else {
        printf("RESULT: %d\n", d);
    }
}

unary <- "-" { $$ = -1; } / "+" { $$ = 1; }
digit <- [0-9]+ { $$ = atoi($0); }

%%
int main() {
    pcc_context_t *ctx = pcc_create(NULL);
    pcc_parse(ctx, NULL);
    pcc_destroy(ctx);
    return 0;
}

If you compile this and run echo 42 | valgrind ./example, it reports (ignoring the uninteresting parts for brevity):

Conditional jump or move depends on uninitialised value(s)
   at 0x10BD4F: pcc_action_integer_0 (example.c:1045)
   by 0x10BCF2: pcc_do_action (example.c:1026)
   by 0x10BD13: pcc_do_action (example.c:1029)
   by 0x10C5D3: pcc_parse (example.c:1252)
   by 0x10C667: main (example.c:1266)

So far I'm using an ugly workaround, checking whether the optional part matched something:

integer <- <u:unary?> d:digit {
    if ($1s != $1e) {
        ...

But it's not very nice. Would it be possible to make sure that the variable is initialized to 0 (or another appropriate value, e.g. NULL if it is a pointer)? Or alternatively, would it be possible to add some syntax to check whether the variable is actually present in the rule? I mean something like if ($u) ..., that would return true if the optional variable is present.

By the way: Another place where similar problem pops up is in alternations:

ruleA <- (b:ruleB / c:ruleC) EOF {
    // do something with b or c, depending on which one was matched
} 

This can be usually worked around by moving the alternation into separate rule, but if we could simply do if (b) ... (or if ($b)), then it would make the grammar easier to read.

dolik-rce commented 2 years ago

After bit more thinking, I believe the simplest solution might be to add a new directive, specifying the default value. That would make it easy to know the right initial value. E.g.:

%value "int"
%defaultvalue "0"

or

%value "MyType *"
%defaultvalue "NULL"

If the directive is not present, then the generated code would be as it is now, ensuring backward compatibility.

arithy commented 2 years ago

Hello @dolik-rce, I think it might be much simpler to ensure that all values are zero-cleared if the corresponding rules have not been evaluated. How do you think about this solution?

dolik-rce commented 2 years ago

Zero cleared would be perfectly fine. Do you mean just calling memset(&variable, 0, sizeof(pcc_value_t))? That should work just right for numeric types, pointers, strings and even for structs.

arithy commented 2 years ago

Yes, I use memset(). I made the modification and push it.

dolik-rce commented 2 years ago

Looks great, I have just tested it and it makes the valgrind warning disappear. Thank you very much.

dolik-rce commented 2 years ago

I'm afraid I was bit too quick in accepting your fix. It does work well in simple rules, but I have found it does not fix some more complex cases. E.g.:

%value "char *"

A <- "A" (b:B / c:C)? .* {
    if (b) printf("%s\n", b); /* this works fine, because code always goes through the first branch */
    if (c) printf("%s\n", c); /* this accesses uninitialized c, because the rule C was never parsed */
}

B <- "B" { $$ = "b"; }
C <- "C" { $$ = "c"; }

%%

int main() {
    pcc_context_t *ctx = pcc_create(NULL);
    char *x = "empty";
    char **ret = &x;
    pcc_parse(ctx, ret);
    pcc_destroy(ctx);
    return 0;
}

Running this:

$ echo "AB" | valgrind -s --track-origins=yes ./example
==175771== Memcheck, a memory error detector
==175771== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==175771== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==175771== Command: ./example
==175771== 
b
==175771== Conditional jump or move depends on uninitialised value(s)
==175771==    at 0x10BD6E: pcc_action_A_0 (test.c:1045)
==175771==    by 0x10BCE2: pcc_do_action (test.c:1025)
==175771==    by 0x10BD03: pcc_do_action (test.c:1028)
==175771==    by 0x10C52F: pcc_parse (test.c:1229)
==175771==    by 0x10C5E7: main (test.c:1246)
==175771==  Uninitialised value was created by a heap allocation
==175771==    at 0x483E6D5: malloc (vg_replace_malloc.c:379)
==175771==    by 0x109292: pcc_realloc_e (test.c:258)
==175771==    by 0x10950D: pcc_value_table__resize (test.c:319)
==175771==    by 0x10BE43: pcc_evaluate_rule_A (test.c:1092)
==175771==    by 0x10B97C: pcc_apply_rule (test.c:972)
==175771==    by 0x10C510: pcc_parse (test.c:1228)
==175771==    by 0x10C5E7: main (test.c:1246)
==175771== 
==175771== 
==175771== HEAP SUMMARY:
==175771==     in use at exit: 0 bytes in 0 blocks
==175771==   total heap usage: 27 allocs, 27 frees, 6,472 bytes allocated
==175771== 
==175771== All heap blocks were freed -- no leaks are possible
==175771== 
==175771== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==175771== 
==175771== 1 errors in context 1 of 1:
==175771== Conditional jump or move depends on uninitialised value(s)
==175771==    at 0x10BD6E: pcc_action_A_0 (test.c:1045)
==175771==    by 0x10BCE2: pcc_do_action (test.c:1025)
==175771==    by 0x10BD03: pcc_do_action (test.c:1028)
==175771==    by 0x10C52F: pcc_parse (test.c:1229)
==175771==    by 0x10C5E7: main (test.c:1246)
==175771==  Uninitialised value was created by a heap allocation
==175771==    at 0x483E6D5: malloc (vg_replace_malloc.c:379)
==175771==    by 0x109292: pcc_realloc_e (test.c:258)
==175771==    by 0x10950D: pcc_value_table__resize (test.c:319)
==175771==    by 0x10BE43: pcc_evaluate_rule_A (test.c:1092)
==175771==    by 0x10B97C: pcc_apply_rule (test.c:972)
==175771==    by 0x10C510: pcc_parse (test.c:1228)
==175771==    by 0x10C5E7: main (test.c:1246)
==175771== 
==175771== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Why did you chose to zero initialize the values only when the rule fails? Was it because of performance reasons? Do you think the to always zero initialize the value would be too big? It could be done in pcc_value_table__resize, just after the memory is allocated. It might actually even have better performance in some cases, because only single memset call would be required to initialize all the variables at once:

MARK_USED_FUNC
static void pcc_value_table__resize(pcc_auxil_t auxil, pcc_value_table_t *table, size_t len) {
    if (table->max < len) {
        size_t m = table->max;
        if (m == 0) m = PCC_ARRAYSIZE;
        while (m < len && m != 0) m <<= 1;
        if (m == 0) m = len;
        table->buf = (pcc_value_t *)PCC_REALLOC(auxil, table->buf, sizeof(pcc_value_t) * m);
        table->max = m;
    }
    table->len = len;
    memset(table->buf, 0, sizeof(pcc_value_t) * len); /* just adding this line should fix the problem */
}

The changes in c2f499e would not be needed with this fix, as the do pretty much the same thing, but only sometimes.

arithy commented 2 years ago

Sorry for my late response.

It could be done in pcc_value_table__resize, just after the memory is allocated.

You are right. I have fixed this issue according to your proposal, but adopted a little bit different way. I have introduced a new function pcc_value_table__clear() to zero-clear values. I'll push it soon.

dolik-rce commented 2 years ago

Sorry for my late response.

No problem, I don't have much free time myself.

This solution is working well. Thank you for the fix.