Open HadrienG2 opened 5 years ago
My planned resolution to this, in the fullness of time, is to put each T
type in a separate allocation that is properly aligned. This behaves a bit better with respect to allocation alignments and such, but means that de-abomonation draws from a set of allocations (which could be slices of the same common allocation, but aligning those is easier than per-record alignment logic).
But, I have been putting off re-engineering Abomonation for safety until I learned more about safety from the UB group (essentially: to learn if a safe version would be possible at all).
We could probably consider alignment at each memcpy, and pre-pend some space before the memcpy if needed. Some care would be required to make sure that at deserialization we make the same decisions, which would probably mean making sure that the decode buffer is similarly aligned to the encode buffer. This is annoying when you are e.g. writing networking code and just get a buffer out from the kernel and would love to avoid a memcpy just to get the alignment better.
Regarding network data alignment, given that on modern CPUs memory protection is managed at page granularity, I would suspect that network buffers coming from the kernel might be page-aligned. If so, that's more than enough alignment, as long as things like protocol headers and non-abomonated network payloads don't get in the way. Those should have the same alignment as on the sender's side though, so even if they do get in the way it doesn't seem terribly hard to work around it.
But given that this page alignment theory is based on hardware implementation details that a Sufficiently Clever Kernel can work around (e.g. by slicing multiple buffers targeted at the same process from a single page), as a paranoid person, I'd probably use a Cow<[u8]>
-like design so that abomonation can check that kernel memory buffers are indeed as aligned as expected, and resort to a malloc-memcpy pair if it turns out not to be the case.
By the way, how much do you care about the serialization output being a Write
implementation? It seems to me that this interface choice, as nice as it usually can be, gets a little in the way when one tries to resolve alignment matters. A lower-level &mut [u8]
-based interface (which will undoubtedly be itself complexified by streaming matters, you can't assume that a buffer large enough to store the entire object is available) might greatly ease implementation.
This issue came back in my work recently, as I tried to figure out how to expose a safe interface to abomonation. So here is a tentative plan to resolve the problem without dropping the nice Write
-based design for serialization (after all, we don't really need serialization output to be aligned):
Abomonation
trait expose the required alignment of said struct for a given type T
(which is basically the max of the alignment of T
and the alignment of every data type which is recursively serialized by T::entomb()
).entomb()
's API contract that it should emit properly aligned output "struct fields" under the assumption that the output buffer was properly aligned before the memcpy, emitting padding bytes as necessary. Provide tools to make this easier, as alignment is easy to get wrong. Make our entomb()
implementations follow the rules, and adjust extent()
and exhume()
implementations accordingly.encode()
/entomb()
, and which part of the code is responsible for discarding these padding bytes during decode()
/exhume()
. Adjust the implementation if necessary.decode
return a Cow<AlignedT>
instead of an &T
, and document in decode()
's API contract that its input buffer should be aligned as specified by our newly added Abomonation::alignment()
contract, or else a buffer reallocation + memcpy will need to happen. Inside of decode()
, implement this via an alignment check.exhume()
also check, via a debug assertion, that its input buffer is properly aligned, as this can help greatly when testing and debugging Abomonation
impls. This could be part of the aforementioned common tools to help Abomonation
implementors get alignment right.As a first step, it is also possible to skip the Cow
part from the initial implementation, and just panic if the input buffer is badly aligned. But I would rather resolve this problem eventually, since the caller of a panicking-on-misalignment decode()
would need to do something akin to what Cow
does anyway.
(Also, we may actually need something more custom than litteral use of the standard Cow
abstraction, because the aforementioned design needs over-sized and over-aligned allocations, and as far as I understand that may not be natively supported by Cow
.)
FYI, an informational advisory (RUSTSEC-2021-0120) was filed against abomination
relating to the unsoundness reported in this issue.
As the docs say, abomonation currently doesn't guarantee correct pointer alignment. This is pretty dangerous, even on x86 as
rustc
might be tempted to generate those evil SIMD instructions that assume the data is aligned and raise an exception otherwise someday.I wonder if there is an API tweak we could use to improve upon this situation?