TimelyDataflow / abomonation

A mortifying serialization library for Rust
MIT License
317 stars 30 forks source link

A possible path forward for padding bytes #32

Open HadrienG2 opened 4 years ago

HadrienG2 commented 4 years ago

So, I've had a quick chat with @RalfJung about our padding bytes problem, and I think I now get a decent grasp of what we need in order to resolve that particular UB in abomonation.

Padding bytes are uninitialized memory, and we now have a safe way to model that in Rust, in the form of MaybeUninit. So we can take a first step towards handling them correctly today by casting &[T] into &[MaybeUninit<u8>] instead of &[u8].

This is enough to memcpy the bytes into another &mut [MaybeUninit<u8>] slice. But it's not yet enough to expose our unintialized bytes to the outside world, e.g. for the purpose of sending them to Write in encode() and entomb(), because Write wants initialized bytes, not possibly uninitialized ones.

To resolve this, we need another language functionality, which is not available yet but frequently requested from the UCG: the freeze() operation, a tool which can turn MaybeUninit<u8> into a nondeterministic valid u8 value. You can think of it as a way to opt out of the UB of reading bad data and defer to hardware "whatever was lying around at that memory address" behavior.

IIUC, something like that was proposed a long time ago, but it was initially rejected by security-conscious people on the ground that it could be used to observe the value of uninitialized memory coming from malloc(), which may leak sensitive information like cryptographic secrets which a process forgot to volatile-erase before calling free().

That precaution is commendable, but on the other hand, giving the growing body of evidence that an UB-free way to access specific regions of memory is needed for many use cases (from IPC with untrusted processes to implementation of certain low-overhead thread synchronization algorithms like seqlock), I'm hopeful that we're likely to get something like that in Rust eventually (and I will in fact take steps to make this discussion move forward once I'm done with my current UCG effort).

TL;DR: For now, this is blocked on a missing Rust feature, but the issue seems understood and is likely to be eventually resolved.

HadrienG2 commented 4 years ago

As an alternative to introducing an explicit freeze() mechanism, this may also be resolved by declaring that manipulating uninitialized machine integers is not UB, which seems to be a broadly consensual opinion in https://github.com/rust-lang/unsafe-code-guidelines/issues/71 .

RalfJung commented 4 years ago

Careful: "manipulating" is and likely will be UB, in the sense that any operation other than copying, like + or == or whatever (including trivial ones such as * 1 or | 0) is UB if any operand is uninitialized.

The only thing that looks like it might change is that a copy of data at integer type (such as what happens when returning an integer from a function or assigning it) preserves uninitialized bytes rather than triggering UB when encountering one. (To be more precise, if any input byte is uninitialized, then the entire resulting integer is.)

HadrienG2 commented 4 years ago

Mmm... that's weaker than I'd like. If I send an [u8] to a safe API like Write, I want that be safe no matter what said API's implementation does. Otherwise it's not really okay.

So maybe we do need freeze() for UB-free abomonation.

RalfJung commented 4 years ago

Mmm... that's weaker than I'd like. If I send an [u8] to a safe API like Write, I want that be safe no matter what said API's implementation does. Otherwise it's not really okay.

That's impossible. It is trivial to cause UB with LLVM by e.g. accessing an array with an uninitialized integer as an offset. (When you check that integer to be in-bounds and when you later use it to access the array, the value doesn't have to be consistent both times!)

Exposing uninit data to safe untrusted code at integer type is not sound under any proposal I am aware of (and that even libstd recognized ;). TBH freeze seems like a band-aid at this point; that kind of API is just flawed.

HadrienG2 commented 4 years ago

So, what else would you propose as a way forward for abomonation's implementation?

RalfJung commented 4 years ago

If you need to expose uninit data to untrusted code, I propose you do a semver-breaking change to use a type (like [MaybeUninit<u8>]) where that can be done safely.

HadrienG2 commented 4 years ago

Cannot be done if the data is to be sent somewhere via Write, then Read back. Write expects &[u8] and Read writes into &mut [u8]

HadrienG2 commented 4 years ago

IIRC, @frankmcsherry uses Abomonation for maximally efficient network data transfers between multiple copies of a Rust program, MPI-style. In this scenario, replacing Write with a custom trait that expects &[MaybeUninit<u8>] would entail completely rewriting Rust's network stack from the syscall (or maybe libc) ground up to accept that data type, which does not seem reasonable.

RalfJung commented 4 years ago

Oh I see, you want to write uninit data to a file?

Interesting. Somehow that use-case didn't occur to me yet. Makes one wonder if C allows uninit data in a buffer used for write...

I do not recall writes being brought up in the https://github.com/rust-lang/rust/issues/42788 discussion. And it's not really the same thing. So, seems worth opening a new issue? I am not aware of one. This is a T-libs issue: right now, it's not possible to write generic code (abstracting over Write) that writes uninitialized data (in particular, structs with padding).

The interesting part is that for Read there are alternatives (like the unstable Read::initializer thing we have right now), but for Write... I can't think of anything other than freeze that would help. This is significant because every other use of freeze is either an awful hack to work around other limitations of the spec, or could be avoided by restructuring everything.

RalfJung commented 4 years ago

Hm okay, there is an alternative: add an unsafe fn write_raw(&mut self, data: *mut [u8]) to Write. But we cannot have a default-impl, so... we can't really add that in libstd.

HadrienG2 commented 4 years ago

In C, char* has an awful lot of special powers with respect to other pointer types. I wouldn't be surprised if it were also allowed to access uninitialized memory by compilers, even if the spec does not mandate them to.

Note that I can personally think of other uses for freeze, including seqlocks and reads from untrusted memory, though both of these use cases would additionally require a spec amendment so that racey reads are "only" defined to return mem::uninitialized instead of being UB...

RalfJung commented 4 years ago

In C the status of padding is basically not defined. Uninitialized memory turns into "indeterminate values" which are also awfully under-specified. So, I'd say in C the rule is whatever the compiler implements and the standard is of no help (as usual).

Note that I can personally think of other uses for freeze, including seqlocks and reads from untrusted memory, though both of these use cases would additionally require a spec amendment so that racey reads are "only" defined to return mem::uninitialized instead of being UB...

Seqlocks don't need freeze AFAIK, they just need "read-write races are not UB but return uninit". A seqlock never looks at the uninit data.

Interaction with untrusted memory is specifically what I mean when I said "hack to work around other limitations"; but we are digressing. ;)

HadrienG2 commented 4 years ago

More discussion on this topic is ongoing at https://internals.rust-lang.org/t/writing-down-binary-data-with-padding-bytes/11197/ .

Currently, the consensus (or at least the subset of it which is useful for the abomonation use case) seems to be moving towards some kind of zero_padding(&mut T) -> &[u8] compiler intrinsic, which would require changes to the encode interface to make the input (harmlessly) mutable.