Closed bjornharrtell closed 9 months ago
minalign
is defined as the size of the largest scalar serialized, or the largest pointer to a scalar that may need to be created (in the case or an empty vector), or similarly, the largest data item with a force_align
attribute.
All such data items are self-aligned to both the start and the end of the buffer, thus the size of the buffer is a multiple of minalign
.
Thanks, that is essentially the alignment reported by the flatcc builder near completion.
I am starting to look into this again.
To summarize: Tail padding discussed above should be fixed, but it is beside the point of the issue being discussed, although it affects comparisons with C++ verifier.
The issue is simple in principle: At least some parts of the verifier tests for alignment relative to buffer start, notably the vector header alignment wrt. the element size even if the vector is empty. The verifier has not been specialized for _with_size (the optional header size prefix), and it should not need to because the data inside must still be aligned. This fails because some checks are relative to buffer start rather than pointer address. Why? Because if the buffer start is 8 byte aligned at the size prefix, it is verified 4 bytes later to skip the size prefix (this is added in the user code seen above). Now suddenly the offset to buffer start appears unaligned.
The fix is likely to add specific with_size variants of the verifier and feed it the original buffer start with size prefix.
@aardappel I have now added code to pad flatcc buffers to the largest object alignment. There was actually alignment code already, but it requires that the user calls start_buffer with a block alignment argument, but it is not set by default.
But I don't think this is quite right either. If a buffer sometimes contains large objects (e.g. double) and large aligned structs, and sometimes not, then buffers will have different alignments, and when stacking buffers with size prefix, the buffer following might have a larger object and be incorrectly aligned.
The solution is to find the theoretical maximum alignment in the schema compiler. The flatcc builder should already be prepared to take such an alignment argument, but it needs to be generated first. But this could also be wasteful if some buffers have rarely used large alignments.
I guess the crucial difference is that so far we have made no guarantee that if you put multiple FlatBuffers consecutively in memory that this guarantees their alignment, like structs would. We only guarantee alignment for a single FlatBuffer, storing multiple FlatBuffers is between you and your memory allocators, etc.
We could make this "work by default" by aligning a FlatBuffer not to the largest stored item, but the largest declared item. I would be against such a change, given how:
So, an implementation should default to "largest stored item". That said, if an implementation has users that would like to "stack" FlatBuffers, an optional "largest declared item" mode could be allowed, since more padding is entirely compatible with the existing FlatBuffers design. But it should not be the default. If you made it the default, then users of said implementation would run into trouble the moment they accept data made by another language. The user would need to be very aware that the alignment they are requesting is not guaranteed across FlatBuffers.
Though frankly, rather than going thru all that trouble, I'd instead recommend to have a way for the user to retrieve the largest declared alignment somehow, and use that to correctly "stack" their buffers, if needed. That is simpler for such a niche use case, and has the advantage that the actual buffers are not bloated unnecessarily when stored outside this stacking, and better yet, it would work with buffers originating from any implementation without those needing updates.
@dbaileychess to see if he agrees with my reasoning.
I agree to that. This is also largely how flatcc has worked, except it did not tailpad until recently. The user could always either self-align before stacking, or passing a block-align argument to the builder.
Just for the record: As far as I can tell from the flatcc code, older version should actually tailpad, but ONLY up to the block align argument and only if different from the default 0 argument. Now the runtime largest block is taken into account when tailpadding. At any rate the largest runtime alignment has always been taking into account for prefix padding so the buffer starts correctly.
I agree with Wouter. Any sort of "meta" flatbuffer work (stacking multiple flatbuffers) should be outside the core library concern. We can provide helper accessors to get the largest declared type if needed.
The test case provided in this PR was useful in adding new _with_size verifiers. See end of #210 for details.
Note that previously tail pad was added which likely helps passing C++ verifier, but that was not the core issue.
I believe I have narrowed down the problem with #210 further. It appears to happen only for buffers with size.
I've made new doublevec_test with a very minimal schema and two test cases, one run without size prefix and one with.. the latter fails with "vector header out of range or unaligned".
ctest --verbose output:
The table definition is:
And the gen code is creating as root and an empty vector for a as it does not seem to matter if I fill it or not with something.
If I create two variants without setting the a vector at all both verifies and looks like this: