dvidelabs / flatcc

FlatBuffers Compiler and Library in C for C
Apache License 2.0
646 stars 182 forks source link

Unexpected core dump from __flatbuffers_uoffset_read_from_pe when compiling with usan gcc #209

Closed bjornharrtell closed 3 years ago

bjornharrtell commented 3 years ago

I'm getting an unexpected core dump in usan gcc compilations and test runs of my code (which lives inside PostgreSQL and PostGIS which makes things rather complicated unfortunately but..). This is the initial lines of the dump:

#0  0x00007f92bf7f7a77 in __flatbuffers_uoffset_read_from_pe (p=0x55cdcd16ccc1) at ../deps/flatcc/include/flatcc/flatcc_endian.h:87
87   __flatcc_define_integer_accessors(__flatbuffers_uoffset, flatbuffers_uoffset_t,

Thread 1 (Thread 0x7f92c880b740 (LWP 28625)):
#0  0x00007f92bf7f7a77 in __flatbuffers_uoffset_read_from_pe (p=0x55cdcd16ccc1) at ../deps/flatcc/include/flatcc/flatcc_endian.h:87
No locals.
#1  0x00007f92bf7f7ef5 in flatbuffers_read_size_prefix (b=0x55cdcd16ccc1, size_out=0x7ffc26005fb0) at ../deps/flatgeobuf/flatbuffers_common_reader.h:545
No locals.
#2  0x00007f92bf802a62 in flatgeobuf_decode_header (ctx=0x55cdcd16d928) at flatgeobuf.c:575

As you can see the calling code is flatgeobuf.c, calling flatbuffers_read_size_prefix, which can be seen here: https://github.com/bjornharrtell/postgis/blob/747bf51385bb37e654f080c1125c5cc9955565d5/postgis/flatgeobuf.c#L571

I see no errors locally in a normal compilation and test run but can reproduce with a usan gcc compilation.

I'm unsure what I'm hitting here and have not been able to figure out what is going on. The buffer looks fine as far as I can tell. Might it be that flatcc has issues with usan gcc? Any other leads?

mikkelfj commented 3 years ago

I don't have too much time to look into this right now, but I'd be happy to follow along and provide suggestions.

First, I am not familiar with usan gcc and couldn't find any pointers. I found ubsan. If you are using that, it would be helpful to know the report is due to the undefined behaviour checker. I am also not familiar with ubsan, I just read briefly.

I assume you are not on big endian, otherwise it would be good to know.

You might have linked with a library compiled with a different compiler, even different integer size.

In the code snippet, you add the size field to the offset, that is ok, but there is a "with_size" variant you could also use to avoid doing this manually.

Later you add the size to the offset. What happens here cannot explain the earlier crash unless the optimizer is very aggressive. I do not recall if the size also includes the size field when the size field is present - probably not, but worth checking, otherwise the next buffer you read will be in trouble.

The next buffer is in trouble anyway, because flatcc does not tail pad the buffer. So the next buffer could be unaligned if you do not align before stacking buffers. That is not a problem if the buffers are created with googles flatc for C++, however, it only pads up to 8 bytes or so (AFAIK), and the next buffer might require more alignment than that.

The crash could be due to the above anyway, if you had already been reading a previous buffer, and if unaligned access is catastrophic on your system, which it is typically not on Intel and recent ARM.

You might want to run the flatcc verifier on the buffer, with_size variant, also without, after manually adjusting the offset. You can change some defines to get asserts at problem spots, including bad alignment.

mikkelfj commented 3 years ago

On running the verifier: since you are inside PostgreSQL, you might want to change the defines differently so you can get custom logging rather than an assert. The top of the verifier.c code should have the defines AFAIR. There is also a doc section on how to debug a buffer. Even if the buffer is correct, there might be alignment issues when accessing the buffer in the current memory location.

bjornharrtell commented 3 years ago

Yes it's ubsan with used compilation flags seen here https://github.com/bjornharrtell/postgis/blob/747bf51385bb37e654f080c1125c5cc9955565d5/ci/travis/run_usan_gcc.sh#L5 and I'm told that the core dump is due to the undefined behaviour checker. In also quite unfamiliar with ubsan and I only rarely dabble in C so I guess I'm a bit out of my depth. I run on little endian (for now).

Thanks for the helpful insights! I will continue my efforts and see if I can figure it out or learn something new.

mikkelfj commented 3 years ago

Since this happens very early in buffer access:

It might be the conversion from raw memory pointer to type pointer via a union cast. This generally works and has been used for decades, but is technically undefined behaviour. There could also be a pointer cast directly, but I believe that has been avoided except where it is valid. Only char * can be cast directly.

The accessors file in the flatcc headers include directory should control those operations and I think the failing __flatbuffers_uoffset_read_from_pe is part of that story.

The formally correct way is to use memcpy from memory into a typed variable, but only recently was memcpy optimized out in this case on common compilers, and not on all compilers. I did have a branch migrating to this approach, but it became unweildy and I did not trust it to be efficient everywhere.

You could try to patch up the conversion reading uoffsets by using memcpy and see if the code fails at a later point.

However, if this is you problem, I don't think I can help unless you want to contribute with a port to memcpy.

bjornharrtell commented 3 years ago

@mikkelfj thanks that seems very likely to be the issue. I believe there is a way to hint code to let ubsan know that the undefined behaviour is intentionally acceptable - I will investigate that path too and see what I can find.

bjornharrtell commented 3 years ago

Allocating a fresh buffer and copying what I got, which (I discovered) did come with a warning that it might not be aligned, with memcpy fixes the core dump, and I've determined that the core dump is specifically triggered by -fsanitize=alignment so that makes sense. After doing this I see a whole lot of other problems but we can assume those are my bugs for now, so the conclusion is that I was feeding flatcc with unaligned data which can work on some hardware but is caught by ubsan if it's set to require proper alignment.