cleishm / libcypher-parser

Cypher Parser Library
Apache License 2.0
149 stars 39 forks source link

Violation of safety checks on children in `cypher_ast_query` constructor #23

Closed jeffreylovitz closed 5 years ago

jeffreylovitz commented 5 years ago

Hi @cleishm,

Thanks very much for the annotation API! I've started migrating our anonymous identifiers to leverage it, and it's working very smoothly so far.

Now that RedisGraph is synced with master here, we're running afoul of a safety check you introduced in 8c76cc5. Specifically, the cypher_ast_query children check: https://github.com/cleishm/libcypher-parser/blob/776d96e97cfa99a98e62acf317ace12a4983f966/lib/src/ast_query.c#L51-L52

To differentiate between scopes in multi-part queries, we invoke this constructor to make sub-ASTs that only represent a slice of clauses (punctuated by WITH and RETURN clauses). With appropriate shame, I admit that this approach involves a bit of dodgy casting:

    cypher_astnode_t *clauses[n];
    for(uint i = 0; i < n; i ++) {
        clauses[i] = (cypher_astnode_t *)cypher_ast_query_get_clause(master_ast->root, i + start_offset);
    }
    struct cypher_input_range range = {};
    ast->root = cypher_ast_query(NULL, 0, (cypher_astnode_t *const *)clauses, n, NULL, 0, range);

This now produces the error: ast_query.c:52: cypher_ast_query: Assertion 'nchildren >= nclauses' failed. We can avoid this by introducing the same variables as children, but this causes a lot of unnecessary allocations.

Do you have any advice for how we could construct ephemeral sub-ASTs to represent a slice of clauses in a more orthodox way?

Thank you!

cleishm commented 5 years ago

So the check is there for good reason, as you probably have guessed. I'll shortly add an additional set of helpers, which will allow for cloning/rewriting an AST.

However, to do that, I have to know that any dependencies of a node are also referenced in the node's children. Hence the added checks.

In your case, the problem with the call cypher_ast_query(NULL, 0, (cypher_astnode_t *const *)clauses, n, NULL, 0, range);, is the fact that you're passing NULL for the children argument.

Simple fix - just use cypher_ast_query(NULL, 0, (cypher_astnode_t *const *)clauses, n, (cypher_astnode_t *const *)clauses, n, range);.

Also, you can get rid of the cast away from const, if you change clauses to be const cypher_astnode_t *. The casting to const, i.e. (cypher_astnode_t *const *), while quite annoying, can't be avoided - so don't beat yourself up about it. It's a known issue in C: http://c-faq.com/ansi/constmismatch.html

Here's your example re-written:

        const cypher_astnode_t *clauses[n];
    for(uint i = 0; i < n; i ++) {
        clauses[i] = cypher_ast_query_get_clause(master_ast->root, i + start_offset);
    }
    struct cypher_input_range range = {};
    ast->root = cypher_ast_query(NULL, 0, (cypher_astnode_t *const *)clauses, n, (cypher_astnode_t *const *)clauses, n, range);

Cheers, Chris

jeffreylovitz commented 5 years ago

Thanks for the rewrite! That does look a lot more reasonable haha.

I have already switched the children argument as you recommend here, but because that array in the constructed AST node is heap-allocated separately, it causes a memory leak:

 24 bytes in 3 blocks are definitely lost in loss record 16 of 222
    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x8B2FFC6: mdup (util.h:147)
    by 0x8B31065: cypher_astnode_init (ast.c:802)
    by 0x8B42018: cypher_ast_query (ast_query.c:60)

clauses are allocated in the same chunk as the AST node itself, so we can just free the constructed node, but I don't think there is an exposed free routine that will handle the children?

cleishm commented 5 years ago

Ah, yes, that's an additional problem.

To clean up AST nodes, you should use cypher_ast_free(astnode) - not free(...) directly. However, that will recursively free the children - so you're going to end up in trouble because those children you just added are already part of another tree.

So forget what I just wrote - that's not going to work safely.

What you need is to clone the clauses before adding them to a new tree. Fortunately, I just pushed such a method to master. Warning though - I was in the middle of writing some tests, and have pushed without them. So whilst I think it's all correct, I haven't finished verifying that.

So, adding that little step, your code would look something like:

        cypher_astnode_t *clauses[n];
    for(uint i = 0; i < n; i ++) {
        clauses[i] = cypher_ast_clone(cypher_ast_query_get_clause(master_ast->root, i + start_offset));
                if (clauses[i] == NULL) {
                        // TODO: cleanup and handle error
                        return -1;
                }
    }
    struct cypher_input_range range = {};
    ast->root = cypher_ast_query(NULL, 0, (cypher_astnode_t *const *)clauses, n, (cypher_astnode_t *const *)clauses, n, range);
        ...
        ...
        ...
        cypher_ast_free(ast->root);

See if that works?

cleishm commented 5 years ago

On consideration, despite the danger, there is some merit to allowing the approach you described.

So I've adjusted the way the recursive release of children is handled, and added a cypher_astnode_free() method to the exposed API. This will allow you to release a node without releasing its children. As long as you take responsibility for releasing the children (or use children that will get released as part of releasing a whole other tree), you should be ok. Dangerous, but ok if you're careful.

jeffreylovitz commented 5 years ago

@cleishm That is perfect, thank you! These scoped ASTs improve our abstractions greatly, but since only one is really "active" at any time for us, deep copies would be a bit wasteful.

Your cypher_astnode_free() solution looks ideal for our purposes; I'll try that out tomorrow.

Thanks for the quick resolution!