Open HadrienG2 opened 5 years ago
This sounds very likely correct. For context, much of this work was done before the Rust team had as clear a sense of undefined behavior or could enumerate what Rust promised to LLVM. I'll check this out, but my guess is that you are correct.
Investigation suggests that this will involve a bit of a rewrite, as the control flow currently descends correcting pointers and such as it goes, whereas it will instead need to make the assignments as it ascends the recursive calls, because the data do not have the appearance of validity until the call is done. It's still a good thing to do, but not a 15 minute fix afaict.
I can also tell you in advance another thing which will cause migration pain: it is very hard to get a field pointer from a struct pointer without going through a reference in current Rust...
That makes sense. I'm guessing that the unsafe code the Rust folks write is localized enough that, e.g. while building a Vec
they can prep the known fields and not have to generalize over other patterns.
This is part of the reason I've been waiting for the unsafe team to sort things out; if they fail to produce something usable then I haven't sunk time into it. Is there an issue filed or a post on the unsafe zulip about the "difficulty of dereferencing" issue?
I can't find the specific threads right now, but there were severaI discussions on github and/or the internals forum recently about this field-from-pointer issue.
On example was someone trying to build movable self-referential structs using field offsets as an alternative to pointers, and needing to figure out said field offset without having an actual valid object to dereference. Another was &mut aliasing UB triggered by people trying to access a field of data behind an &mut pointer-to-struct.
"Raw references" are frequently invoked as a possible future solution to this kind of problem.
Pinging @RalfJung, maybe he'll remember more than I do.
Sweet. I just dropped a comment in the WG: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Abomonation.20UB
I've not been able to keep up with the discussion there, or participate meaningfully, as it is essentially a full-time non-paying job to do so. Ideally I can just ping them occasionally and hope that they make informed decisions that don't close off aspects of performance-oriented computing.
Good news, it seems that the &raw
RFC is being accepted! So it will soon be possible to do this:
let field_ptr = &raw mut (*struct_ptr).field;
field_ptr.write(new_value);
In meantime, the widely used workaround is this...
let field_ptr = &mut (*struct_ptr).field as *mut _;
field_ptr.write(new_value);
...which is UB because an &mut
to invalid data is created, but no UB-induced miscompilation has been reported with current versions of rustc and LLVM, most likely because LLVM has no reason to follow the assumed-valid field pointer in this scenario...
On the other hand, integrating this is going to be a little bit more ugly than replacing &mut self
with self: NonNull<Self>
because the latter is currently not supported by Rust. But that's okay, we can just give up on the nice trait dispatch of object.method(...)
for now and go for Type::method(object, ...)
instead.
I'm taking a closer look at this, hopefully will submit a PR later on.
Neat, thanks!
I just checked the Zulip and it seems that there may not be as much UB as we think. At least, take a read:
especially about Ralf's comments about dereferenceable
not being transitive. That is, apparently a &mut Vec<T>
needs to point at a valid 24 bytes, but it doesn't mean that its pointer needs to point at allocated memory.
Let me know if that is your read too.
Yes, my understanding of Ralf's comment is that many unsafe
-based constructs (Box
, Vec
...) are currently based on raw pointer varieties which are not subjected to LLVM's dereferenceable
guarantee, unlike &[mut]
.
I'm not sure if I am comfortable with relying on this implementation detail too much, though. There is quite a big gap between the invariants of &mut T
(non-null, well-aligned, unique, valid allocation, valid target data) and *mut T
(almost nothing), and the Rust team have built several pointer types with intermediary guarantees before (NonNull
, the unstable Unique
, maybe others...). I wouldn't be surprised if it happened again with the dereferenceable
guarantee in the future.
Further, I am not sure how safe abomonation currently is regarding &mut
's other invariants, in particular the uniqueness one which even the Rust team got wrong with generators. I think it is common to end up with, for example, an &mut
to a Box
and an &mut
to its target, without a clear reborrow that the compiler can identify as such inbetween.
So I think switching to a NonNull
-based interface with much weaker pointer invariants attached would still be worthwhile for the sake of making Abomonation
impls easier to write correctly and of localizing their (currently inconsequential) violation of the &mut
data validity invariant in a smaller region of the code.
That does sound worthwhile still, I just didn't want you to do a bit of work without seeing that!
Rust references allow the compiler to assume that the data behind them is valid. One way in which
rustc
currently uses this is to tag the associated pointers with LLVM'sdereferencable
attibute, which allows the latter to prefetch from them to its heart's content. This kind of smart optimization should not be allowed before objects are exhumed, as it can lead to undefined behavior like LLVM following dangling pointers and segfaulting the program.Therefore, I think
exhume
should not take its target object as a Rust reference, but as aNonNull
pointer, which provides no guarantee of target data validity to rustc and therefore doesn't allow the compiler to muck around with it.