UpstandingHackers / hammer

Parser combinators for binary formats, in C. Yes, in C. What? Don't look at me like that.
GNU General Public License v2.0
430 stars 40 forks source link

Handle memory allocation failures gracefully #155

Closed pesco closed 8 years ago

pesco commented 8 years ago

I'm opening this as an early request for comments.

Most code in hammer is written as if memory allocations could never fail. In particular, the result of h_arena_malloc isn't checked anywhere. This only works if the OS will always overcommit (e.g. Linux sysctl vm.overcommit_memory=1) and there are no resource limits in place (e.g. ulimit -v).

Issue #133 reported basically the same issue but focused on the system_allocator. PR #136 purports to fix it but ignores the CF backends and adds more potential NULL pointers to arena allocator code. It did add some longjmp-based "exception" handling to RVM compilation, though.

I propose to extend the longjmp approach to arena allocations such that h_arena_malloc never returns on failure but either performs a longjmp or exits. A longjmp target is set up and attached to the HArena data structure in a top-level function and low-level code can stay as it is. I've implemented this for the regex backend to demonstrate.

In addition, unchecked call sites of h_new, mm__->alloc, h_new_arena and thelike have to be identified and NULL-handling added where appropriate.

pesco commented 8 years ago

Some Travis builds fail on the following warning:

$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
[...]
gcc -o build/opt/src/backends/regex.o -c -std=gnu99 -Wall -Wextra -Werror -Wno-unused-parameter -Wno-attributes -Wno-unused-variable -O3 build/opt/src/backends/regex.c
build/opt/src/backends/regex.c: In function 'h_rvm_run__m':
build/opt/src/backends/regex.c:55:12: error: variable 'heads_n' might be clobbered by 'longjmp' or 'vfork' [-Werror=clobbered]
build/opt/src/backends/regex.c:56:6: error: variable 'heads_p' might be clobbered by 'longjmp' or 'vfork' [-Werror=clobbered]

This does not appear on my workstation with GCC 5.2.1 and it only seems to happen when the BINDINGS environment variable is set. My guess is that something in the bindings pre-install process (where some apt-get install calls happen) silently upgrades or somehow changes GCC.

The cause of the warning (which I can't reproduce) seems to be that the variables heads_n and heads_p get swapped in one code path. So I don't think it actually constitutes a problem (swapping is not really clobbering).

PS: The C++ binding seems to use GCC even on the Clang build variant; is that supposed to happen?

pesco commented 8 years ago

My guess is that something in the bindings pre-install process (where some apt-get install calls happen) silently upgrades or somehow changes GCC.

No, it happens with gcc -O3. For some reason, the travis builds with bindings are optimized while BINDINGS=none makes a debug build. The problem doesn't appear with clang -O3.

I reproduced the issue with GCC 4.2 on OpenBSD. It doesn't appear with GCC 5.2 on Linux.

abiggerhammer commented 8 years ago

Oh hey, it's building cleanly now. Are we done here, or is there more still to be done with the CF backends?

pesco commented 8 years ago

There is still work to do; the CF backends but especially Packrat (we need that sorted for DNP3), and lastly the NULL checks on any allocs that are not arena allocs.

I'm looking for an all-clear to go on with this relatively invasive change. I did it for regex at first precisely because that's the code I've never touched before (=> maximum invasion).

abiggerhammer commented 8 years ago

Full speed ahead!

pesco commented 8 years ago

So, longjmps in the other backends are still TODO but I have switched all non-arena allocation to the crash-and-burn strategy of error handling. Here is what I wrote in the commit message:

add h_alloc() which calls errx() on failure and use it for all basic allocation

Rationale: "Basic allocation" refers to things outside of parsing proper,
mostly initialization. If such allocations fail, the system is globally
memory-starved from which it will likely not recover by returning failure.
In this case, terminating the process is in fact the most robust strategy as
it may mean the difference between a permanent hang and a temporary crash.

So the idea is that parsers, anything else that gets built with h_new, initial creation of arenas with h_new_arena, etc. terminate the process on failure, while arena-based allocations during the parse (using h_arena_malloc) make h_parse return NULL.

NB: We characterized the latter during our last conference call as the user getting to dynamically discover the bounds of his input language as dictated by memory.

pesco commented 8 years ago

The above commits should make h_parse return NULL for all backends when it runs out of memory during the parse. I haven't gotten around to actually triggering it, yet, though, because I fail at ulimit or something.

The iterative variants are still TODO as they are not quite as trivial (but should be easy enough).

pesco commented 8 years ago

All done, request merge.

Did the test properly this time around (with a mock allocator that selectively returns NULL). I'd still appreciate a comment on the rationale for 3fc56a0 which I quoted above.

thequux commented 8 years ago

LGTM