TimelyDataflow / abomonation

A mortifying serialization library for Rust
MIT License
317 stars 30 forks source link

Shouldn't exhume take a NonNull<Self> rather than a &mut Self? #17

Open HadrienG2 opened 5 years ago

HadrienG2 commented 5 years ago

Rust references allow the compiler to assume that the data behind them is valid. One way in which rustc currently uses this is to tag the associated pointers with LLVM's dereferencable attibute, which allows the latter to prefetch from them to its heart's content. This kind of smart optimization should not be allowed before objects are exhumed, as it can lead to undefined behavior like LLVM following dangling pointers and segfaulting the program.

Therefore, I think exhume should not take its target object as a Rust reference, but as a NonNull pointer, which provides no guarantee of target data validity to rustc and therefore doesn't allow the compiler to muck around with it.

frankmcsherry commented 5 years ago

This sounds very likely correct. For context, much of this work was done before the Rust team had as clear a sense of undefined behavior or could enumerate what Rust promised to LLVM. I'll check this out, but my guess is that you are correct.

frankmcsherry commented 5 years ago

Investigation suggests that this will involve a bit of a rewrite, as the control flow currently descends correcting pointers and such as it goes, whereas it will instead need to make the assignments as it ascends the recursive calls, because the data do not have the appearance of validity until the call is done. It's still a good thing to do, but not a 15 minute fix afaict.

HadrienG2 commented 5 years ago

I can also tell you in advance another thing which will cause migration pain: it is very hard to get a field pointer from a struct pointer without going through a reference in current Rust...

frankmcsherry commented 5 years ago

That makes sense. I'm guessing that the unsafe code the Rust folks write is localized enough that, e.g. while building a Vec they can prep the known fields and not have to generalize over other patterns.

This is part of the reason I've been waiting for the unsafe team to sort things out; if they fail to produce something usable then I haven't sunk time into it. Is there an issue filed or a post on the unsafe zulip about the "difficulty of dereferencing" issue?

HadrienG2 commented 5 years ago

I can't find the specific threads right now, but there were severaI discussions on github and/or the internals forum recently about this field-from-pointer issue.

On example was someone trying to build movable self-referential structs using field offsets as an alternative to pointers, and needing to figure out said field offset without having an actual valid object to dereference. Another was &mut aliasing UB triggered by people trying to access a field of data behind an &mut pointer-to-struct.

"Raw references" are frequently invoked as a possible future solution to this kind of problem.

Pinging @RalfJung, maybe he'll remember more than I do.

frankmcsherry commented 5 years ago

Sweet. I just dropped a comment in the WG: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Abomonation.20UB

I've not been able to keep up with the discussion there, or participate meaningfully, as it is essentially a full-time non-paying job to do so. Ideally I can just ping them occasionally and hope that they make informed decisions that don't close off aspects of performance-oriented computing.

HadrienG2 commented 5 years ago

Good news, it seems that the &raw RFC is being accepted! So it will soon be possible to do this:

let field_ptr = &raw mut (*struct_ptr).field;
field_ptr.write(new_value);

In meantime, the widely used workaround is this...

let field_ptr = &mut (*struct_ptr).field as *mut _;
field_ptr.write(new_value);

...which is UB because an &mut to invalid data is created, but no UB-induced miscompilation has been reported with current versions of rustc and LLVM, most likely because LLVM has no reason to follow the assumed-valid field pointer in this scenario...

HadrienG2 commented 5 years ago

On the other hand, integrating this is going to be a little bit more ugly than replacing &mut self with self: NonNull<Self> because the latter is currently not supported by Rust. But that's okay, we can just give up on the nice trait dispatch of object.method(...) for now and go for Type::method(object, ...) instead.

I'm taking a closer look at this, hopefully will submit a PR later on.

frankmcsherry commented 5 years ago

Neat, thanks!

I just checked the Zulip and it seems that there may not be as much UB as we think. At least, take a read:

https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Abomonation.20UB

especially about Ralf's comments about dereferenceable not being transitive. That is, apparently a &mut Vec<T> needs to point at a valid 24 bytes, but it doesn't mean that its pointer needs to point at allocated memory.

Let me know if that is your read too.

HadrienG2 commented 5 years ago

Yes, my understanding of Ralf's comment is that many unsafe-based constructs (Box, Vec...) are currently based on raw pointer varieties which are not subjected to LLVM's dereferenceable guarantee, unlike &[mut].

I'm not sure if I am comfortable with relying on this implementation detail too much, though. There is quite a big gap between the invariants of &mut T (non-null, well-aligned, unique, valid allocation, valid target data) and *mut T (almost nothing), and the Rust team have built several pointer types with intermediary guarantees before (NonNull, the unstable Unique, maybe others...). I wouldn't be surprised if it happened again with the dereferenceable guarantee in the future.

Further, I am not sure how safe abomonation currently is regarding &mut's other invariants, in particular the uniqueness one which even the Rust team got wrong with generators. I think it is common to end up with, for example, an &mut to a Box and an &mut to its target, without a clear reborrow that the compiler can identify as such inbetween.

So I think switching to a NonNull-based interface with much weaker pointer invariants attached would still be worthwhile for the sake of making Abomonation impls easier to write correctly and of localizing their (currently inconsequential) violation of the &mut data validity invariant in a smaller region of the code.

frankmcsherry commented 5 years ago

That does sound worthwhile still, I just didn't want you to do a bit of work without seeing that!