attractivechaos / klib

A standalone and lightweight C library
http://attractivechaos.github.io/klib/
MIT License
4.18k stars 556 forks source link

Got wrong hash for the same string for hash map #98

Closed stevefan1999-personal closed 6 years ago

stevefan1999-personal commented 6 years ago

Reproduction code:

KHASH_MAP_INIT_STR(var, double);
static khash_t(var) *h = NULL;
static double get_var(const char *name) {
    khiter_t k = kh_get(var, h, name);
    double val = k != kh_end(h) ? kh_val(h, k) : 0;
    printf("value of %s is %g\n", name, val);
    return val;
}

static void set_var(const char *name, double val) {
    printf("set %s to %g\n", name, val);

    int ret;
    khiter_t k = kh_put(var, h, name, &ret);
    kh_value(h, k) = val;
}

void test() {
    if (h == NULL) {
        h = kh_init(var);
    }

    const char *foo = strdup("foo");

    set_var(foo, 20);
    free(foo);

    get_var("foo");
    get_var(strdup("foo"));
}

Result:

set foo to 20
value of foo is 0
value of foo is 20

If I turned on optimization:

set foo to 20
value of foo is 0
value of foo is 0

wtf

stevefan1999-personal commented 6 years ago

So __KHASH_TYPE still stores keys references 🤦‍♂️

stevefan1999-personal commented 6 years ago

Workaround: duplicate the same string and hope that you could iterate the key list and free them.

static void set_var(const char *name, double val) {
    printf("set %s to %g\n", name, val);

    int ret;
    khiter_t k = kh_put(var, h, strdup(name), &ret);
    kh_value(h, k) = val;
}

But I still think this is a problem, it is very prominent in my little calculator example, that the identifiers as represented as an AST string but the AST node must be freed after evaluation, which in turns erases the duplicated AST string, and the reference of the string in the hash map is immediately invalid afterward. Now I 'reduplicate' another identifier string, but that means extra memory is used. This is a big headache...

attractivechaos commented 6 years ago

Khash simply keeps pointers. You are responsible for the memory these pointers point to.

identifiers as represented as an AST string but the AST node must be freed after evaluation, which in turns erases the duplicated AST string, and the reference of the string in the hash map is immediately invalid afterward.

I don't understand. Once strings are duplicated, they will be there before you explicitly free them.