Soundness of AtomicCell::compare_exchange is dubious

RalfJung commented 5 years ago

AtomicCell<T>::compare_exchange will, if I understand the code correctly, use AtomucU*::compare_exchange_weak if T has the right size. But this means that e.g. for (u8, u16), which has size 4, it will transmute these pairs into u32 and compare them as integers. However, such pairs contain a padding byte, meaning that the u32 contains some uninitialized memory. Having uninitialized memory in a u32 is uncharted territory at best, but performing actual operations on such a datum (as opposed to just load/store) brings us very close to UB. I am not sure what the rules are in LLVM for this case, but under the proposed poison semantics I think the comparison will return poison and hence using it in an if is UB. For Rust, I think the intention is to make any operation on uninitialized data UB (like it is in C), meaning the comparison would already return UB.

Strictly speaking, this affects not just the fast path, but also the arbitrarily-sized case: bytes_eq might compare uninitialized bytes.

BurntSushi commented 5 years ago

But this means that e.g. for (u8, u16), which has size 4, it will transmute these pairs into u32 and compare them as integers.

If you don't mind me asking, is the transmute itself UB as well? In particular, I had thought that the in-memory representation of tuples isn't guaranteed? Or is it just the ordering of fields that isn't guaranteed?

RalfJung commented 5 years ago

Well, the code checks that the sizes match. Ignoring complicated issues around pointers, the only way in which transmuting a 4-byte piece of data to u32 is UB is if there are bit patterns that are not valid for u32 -- which boils down to the question of whether uninitialized bytes are allowed in u32.

jeehoonkang commented 5 years ago

It seems C/C++ standard committee members tend to agree that all bit patterns are valid for u32 and other unsigned integer types: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html "For unsigned integer types, the bits of the object representation are ..." I think it's reasonable, because (once we agree to use two's complement) there's no other sane choice.

ghost commented 5 years ago

@RalfJung That sounds overly restrictive to me.

IMO, it should be possible to transmute a value of any type T into [u8; size_of::<T>()], read those bytes, and transmute back into T without UB. I would bet a lot of C and Rust code has been written under that assumption.

jeehoonkang commented 5 years ago

@stjepang To be very pedantic (sorry!), u8 is special in that everything can be legally cast to an array of u8 in Rust or char in C/C++. But u32 is not special in that sense, at least according to the C/C++ standard :(

And yet I agree with your sentiment: I believe representing an object of any type as u32 or any other unsigned integer types should not be UB. Edit: in other words, u8 should not be special at the first place.

hanna-kruppe commented 5 years ago

I am not aware of any precedent for u8 being special in Rust, and in fact all the discussion of the invariants of integer types I'm aware of treats all integer types alike. It's true that in C and C++, char and its signed/unsigned variants are very special, but that is precisely one of the thorny parts of writing unsafe code in those languages that Rust should try to smooth out.

ghost commented 5 years ago

Thanks for the clarification! I agree with you and would expect u8 to not get special treatment over other integer types. It's all just bytes to me.

RalfJung commented 5 years ago

This is Rust, not C++. :)

"For unsigned integer types, the bits of the object representation are ..." I think it's reasonable, because (once we agree to use two's complement) there's no other sane choice.

Remember that every bit can be 0, 1 or uninitialized. Even the C++ standards committee acknowledges this, though just indirectly -- namely, it acknowledges that "indeterminate values" can be "unstable" in the sense that x == x on an integer can be false. This can only be the case if there are states beyond 0 and 1. I agree every sequence of 0 and 1 is fine in u32. But for uninitialized, it is far from reasonable to allow such bits in a u32. (And indeed I picked u32 as an arbitrary example here, the rules should be the same for all integer types.)

If we allow that, we have to answer other hard questions like what happens when you do arithmetic or comparison involving uninitialized bits. That's why I think that any uninitialized bits (outside of padding) should only exist in a MaybeUninit -- that is, a [MaybeUninit<u8>; N] can represent any data, but a [u8; N] cannot. There is a conflict between permitting arithmetic and other operations on a type, and permitting it to hold "strange bits" (uninitialized bits, but also transmuted pointers can be a problem here) -- and since u8 permits arithmetic, it should not permit "strange bits".

It's all just bytes to me.

But bytes are complicated.

ghost commented 5 years ago

@RalfJung Fair enough, this is difficult. :(

It might be helpful to look at what C/C++ do on atomic compare-exchange with arbitrary (non-POD) data. This is interesting:

The comparison and copying are bitwise (similar to std::memcmp and std::memcpy); no constructor, assignment operator, or comparison operator are used.

When a weak compare-and-exchange would require a loop and a strong one would not, the strong one is preferable unless the object representation of T may include padding bits, (until C++20) trap bits, or offers multiple object representations for the same value (e.g. floating-point NaN). In those cases, weak compare-and-exchange typically works because it quickly converges on some stable object representation.

For a union with bits that participate in the value representations of some members but not the others, compare-and-exchange might always fail because such padding bits have indeterminate values when they do not participate in the value representation of the active member.

Padding bits that never participate in an object's value representation are ignored.

Source: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

Boost atomics will do reinterpret_cast just like AtomicCell (source). However, on MSVC-8 and MSVC-9, spinlocks are used because otherwise the compiler would miscompile code (source).

RalfJung commented 5 years ago

Yeah, I figured this would be related to memcmp but unfortunately I was not able to find anything precise about memcmp on padding bytes. We have two kinds of problems here:

Is the LLVM IR we currently generate for this valid, or does it have UB? The atomic comparison and byte_eq are working with poisoned data, so I am not sure. We could argue that atomic reads are frozen (which is an assumption AtomicCell already makes for volatile reads), which would help for the atomic comparisons but not for the fallback case (the data compared by byte_eq is read with a normal, non-atomic non-volatile load). However, the C++ spec you are quoting makes it sound like it is not at all clear that atomic reads are frozen...
Beyond the LLVM rules, are there extra Rust rules this violates? This is where the part about having uninitialized bits in a u32 comes in. I'd prefer if this was all MaybeUninit<u32> but of course we don't have atomic operations on those...

RalfJung commented 5 years ago

@Amanieu @stjepang with the newly proposed ptr::freeze and if you remove AtomicCell::get_mut, I think this can be made sound. You basically just have to make sure that you only ever write frozen stuff into the cell, and freeze things before you compare them. In contrast to the "zero padding"-based schemes @Amanieu suggested at the all-hands, this does not require a special function that knows where the padding bytes are.

Amanieu commented 5 years ago

That doesn't exactly solve the problem though: compare_exchange returns a T with the current value in the atomic if the compare fails. In the most common use case (CAS loop) this will be passed back in as the current argument to compare_exchange in the next loop iteration.

However since the padding bytes in T are always undef, the freeze operation could freeze them into a different value than the one in the atomic, which would cause the next compare_exchange call to fail even though the value in the atomic was not changed since the last iteration. This could result in the CAS loop never converging to a stable state.

ghost commented 5 years ago

@RalfJung Awesome news!

@Amanieu Are you sure? We have this if statement that should solve the issue you're describing. Note that compare_exchange will never fail spuriously due to the difference in undef bytes.

Amanieu commented 5 years ago

@stjepang Ah right. So you won't get spurious failures, but you could in theory get stuck in an infinite loop.

ghost commented 5 years ago

@Amanieu Hmm, not sure I understand how. If we do cell.compare_exchange(old, new) and we know cell.get() == old, the operation will always succeed, even if the two values differ in padding bytes.

Could you perhaps sketch out a simple example? I don't see the theoretical infinite loop here.

Amanieu commented 5 years ago

Let us assume some 2-byte type (u8, __pad: u8). The first byte contains a valid value, the second byte is padding. The underlying atomic type used is AtomicU16.

The current value in the AtomicU16 is (7, 0).
The user calls cell.compare_exchange(old = (7, undef), new = (8, undef)).
The undefs get frozen as u16s with values: old = (7, 60), new = (8, 90).
The inner atomic_compare_exchange_weak fails because the padding bytes don't match.
The inner atomic_compare_exchange_weak returns Err((7, undef)) because it returns a T instead of a u16, the value is "unfrozen"
compare_exchange loops back to step 2 with old = (7, undef).

Now that I look at it this way, I think this is fixable if you keep the previous value as a u16 when passing it back up the loop instead of converting it back to a T.

ghost commented 5 years ago

Thanks! I agree about keeping the previous value as a u16 rather than going u16 -> T -> u16. That should solve the problem.

RalfJung commented 5 years ago

Ah good point, yes you need to do the entire loop without round-tripping through T.

It is a bit dissatisfying that we cannot also have get_mut, though... all we need to have that, too, is an "atomic freeze operation" -- a way to freeze memory as the first thing compare_exchange does, in a way that does not count as a data race in the memory model. That seems like a pretty powerful operation though, and I have no idea how to integrate it with the rest of the memory model.

Pragmatically speaking, an asm! block probably has this effect as far as LLVM is concerned. But that's not really good enough for me.^^

mtak- commented 5 years ago

Any thoughts on replacing get_mut with store_mut/load_mut which, IIUC, could still be implemented soundly?

RalfJung commented 5 years ago

These could freeze, so yes, they could be implemented in a sound way in this model.

jeehoonkang commented 5 years ago

I believe (1) Rust should not blame padding bytes as invalidate values as well, and (2) we should de-deprecate AtomicCell::get_mut(). This idea is not just popping from my head, but is actually inherited from the precedence of C/C++.

Paddings are not invalidate but unspecified in C/C++. That means (1) you don't know what's the value of the paddings, but (2) computations on the paddings should not invoke undefined behavior. It is for a good reason. For example, padding bytes that do not introduce undefined behavior is essential in our implementation of AtomicCell::get_mut().

RalfJung commented 5 years ago

AFAIK padding bytes are indeterminate in C/C++. That's the same status as uninitialized memory, and it is very different from unspecified. Whether or not computations on indeterminate values are UB is not clear from the standard, but:

In the case of an indeterminate value all bit-patterns are valid representations and the actual bit-pattern may change without direct action of the program.

Source: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1793.pdf

In particular, comparing indeterminate values with anything is meaningless, because they do not have a stable bit pattern. This is a problem for compare_exchange, which performs exactly this kind of comparison.

Also, another point not clear from the standard is what happens when you memcpy a 4-byte struct including a padding byte into a uint32_t -- does the entire integer value become indeterminate, or is this somehow tracked on a per-byte basis? From what I recall from the LLVM poison discussion, there are optimizations that are incompatible with either choice.

jeehoonkang commented 5 years ago

According to C17 6.2.6p6, "padding bytes take unspecified values":

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

RalfJung commented 5 years ago

Interesting. However, we also have C17 6.7.1 p9:

unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization.

and 6.7.2.1 p15 + p17:

There may be unnamed padding within a structure object, but not at its beginning.

There may be unnamed padding at the end of a structure or union.

Also, to my knowledge, making padding indeterminate is needed to make SROA (scalar replacement of aggregates) a valid optimization.

RalfJung commented 5 years ago

Also note footnote 51 in your quote is

51) Thus, for example, structure assignment need not copy any padding bits.

But that doesn't square with just being unspecified: if I have an uninitialized variable and initialize it by assignment, and the assignment does not copy the padding bits, then those bits will remain indeterminate.

jeehoonkang commented 5 years ago

For unnamed members/paddings, I think the standard intentionally distinguished members and paddings: paddings are not members, and unnamed members are indeterminate and unnamed paddings are unspecified. Also, I think the footnote 51 should be interpreted as informative, but not normative. It probably suggests an implementation strategy.

I think SROA will be much more interesting than language laws. Would you please give me a reference or example why paddings should be indeterminate for validating SROA?

RalfJung commented 5 years ago

I think that's just an artifact of wording -- padding is a kind of unnamed member. What other unnamed members would there even be?

Would you please give me a reference or example why paddings should be indeterminate for validating SROA?

Well, one effect of SROA is that only the elements of a struct get copied, not its padding -- exactly what the footnote alludes to.

struct Foo { uint8_t f1, uint16_t f2 };

Foo bar(Foo foo) {
  Foo foo2 = foo; // assume this one gets SROA'd
  return foo2;
}

Foo bar2(Foo foo) {
  Foo foo2;
  foo2.f1 = foo.f1;
  foo2.f2 = foo.f2;
  return foo2;
}

bar is basically transformed to bar2: it will return a Foo where the two fields got copied from the argument, but the padding was left uninitialized -- and hence is indeterminate.

jeehoonkang commented 5 years ago

Unnamed fields are anonymous ones: https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html I still think paddings are not unnamed members mentioned in the standard.
I believe paddings are never indeterminate in C, so your SROA example is sound even though the padding bytes are unspecified values.

RalfJung commented 5 years ago

Unnamed fields are anonymous ones: gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

How would it make any sense that those members do not get initialized?

I still think paddings are not unnamed members mentioned in the standard.

Given that padding is explicitly called out as "unnamed", I disagree with that reading.

Notice that this interpretation is not just mine, I took it from https://wiki.sei.cmu.edu/confluence/display/c/EXP42-C.+Do+not+compare+padding+data which explicitly quotes the above parts of the standard when talking about padding.

I believe paddings are never indeterminate in C, so your SROA example is sound even though the padding bytes are unspecified values.

I don't follow. Uninitialized variables are indeterminate. Why would their padding not be...?

The standard is generally extremely vague about all this indeterminate/unspecified/trap representation business. I think it is much clearer to follow the approach proposed for LLVM, where "indeterminate" becomes poison and there's a proper formal definition. Unintiailized memory is poison. The only way to explain my example above is to say that copying a struct resets all padding to poison.

jeehoonkang commented 5 years ago

On "unnamed":

I think distinguishing unnamed members and unnamed paddings, is the only way to make sense of C17 6.2.6p6 and [C17 6.7.1 p9 and 6.7.2.1 p15 + p17] at the same time. Even if they conflict with each other, 6.2.6p6 is more clearly describing the semantics of padding with unambiguous words: padding bytes are unspecified. I prefer to follow it.
On uninitialized struct/union values:

According to C18, struct and union values, even if uninitialized, should not be a trap representation. (Here, "trap representation" means what we called "indeterminate" in this discussion: C18 3.18.2 says an indeterminate value is "either an unspecified value or a trap representation".) C18 6.2.6p6 says: "The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation."

RalfJung commented 5 years ago

On "unnamed"

I see where you are coming from. OTOH, I feel if the people doing the CERT C Coding Standard interpret it the way I quoted, that probably means they are not alone. So at this point it probably stops being useful to stare at the vague C standard.

On uninitialized struct/union values:

Yeah, unfortunately the entire "trap representation" stuff in C/C++ is really poorly specified. For example, integers also don't have "trap representations" (I believe the next standard will even say that), and yet in the mainstream C compilers there are "weird" integers like undef and poison which definitely cannot be treated like "normal" values in the C standard (for example, x == x may return false and x == 0 ? 0 : y/x is UB because x might be non-zero the first time you look at it and 0 the second time).

So it is probably more useful to look at what compilers actually do. The only one I am familiar with and the most relevant for Rust is LLVM. LLVM trunk is inconsistent in what it does, but there is a proposal for a consistent model. In that model, structs and struct fields can definitely be poison. The part about structs not having trap representations likely stems from structs not being stored in registers, and registers with a constrained value range being the source of these "trap representations". But nowadays this is used in a whole different way. So for the purpose of this discussion I think we can safely ignore any part of the C standard that says certain things cannot have "trap representations"; compilers have poison/undef for all sorts of values.

So, in the poison model, I claim that reading padding gives back poison. In current LLVM, with both poison and undef, padding is undef. Even though I have been told this several times, I cannot find a good reference for it. The aforementioned paper says

LLVM uses undef in a few other situations. First, to represent the values of padding in structures, arrays, and bitfields.

Similar things are said in this bugreport. @rkruppe do you know if the LLVM IR specifies this padding stuff properly anywhere?

Since a mere read access to some location in memory cannot even "know" if it is reading padding or whatever, the only way I see to realize "reading padding gives undef/poison" is to say that copying a struct (at struct type) will reset the padding to undef/poison. But that doesn't even matter that much, if seems quite clear that with get_mut we can make the compare_exchange code read padding bytes, and hence (according to what everyone seems to agree LLVM does), reading poison/undef and use them in computations.

Also see this where more people agree with me that inspecting padding is UB in modern compilers (they say that violates the standard, which I am not sore sure about, but given the ambiguity of the standard that does not seem like a useful discussion).

jeehoonkang commented 5 years ago

I think the difference is on the viewpoint. I, as a systems programmer, think that get_mut() is sane, and it should be supported in any systems programming languages. From the viewpoint, I try to find evidence for allowing reading of uninitialized paddings, such as paragraphs in the standards and implementations in compilers. On the other hand, you (and obviously I to a certain extent), as programming language researchers, think that there should be a good programming model that explains the real-world practice. Unfortunately, this conflict doesn't seem to be resolved anytime soon :( But for the purpose of library building, I prefer to take the viewpoint of systems programmers.

To summarize my viewpoint, (1) get_mut() is sane from a systems programmer's viewpoint, and (2) if get_mut() is currently not supported, it's the programming language's fault, and furthermore, (3) we should revise the undef/poison model (or anything else) so as to explain get_mut(), unless we could find a fundamental reason why it's undesirable to do so.

@stjepang For this reason, I'd like to propose to revert #332 (Deprecating AtomicCell::get_mut).

RalfJung commented 5 years ago

I, as a systems programmer, think that get_mut() is sane, and it should be supported in any systems programming languages.

I would like to support it! However, the way we ended up in this messy situation with C/C++ is by ignoring the fact that a language must have a single consistent semantics, and relying on mutually incompatible axioms and intuition instead.

In my view, your argument is flawed. From a systems programmer's viewpoint, one could argue that all sorts of things are "sane" -- things like benign data races or OOB loads on a definitely mapped page or implementing small atomic accesses with bigger accesses. The issue is that these arguments mix up intuition formed by looking at "what the hardware does", with "what the programming language is supposed to do". This is not a sensible way to argue, and it leads to miscompiled programs quickly.

What you are doing here is closing your eyes in the face of the problem and hoping that that makes it go away. It won't go away. It's not just "the language's fault"---we can make get_mut sound if we disable a bunch of optimizations. What we don't know is how to have both get_mut and these optimizations. Finding a way to reconcile this conflict is of course our responsibility as language designers, but that does not mean that we can just axiomatize it away. It is not uncommon that having two things that both seem intuitive is not actually possible at the same time (as you are well aware of course). We can either follow the C approach of "YOLO whatever, it won't be too bad" (leading to mysteriously miscompiled programs, long debugging sessions and the entire rest of the mess that is called "C semantics"), or we can acknowledge that when formal semantics are in conflict with systems programming intuition, sometimes there's an actual problem here that we cannot just ignore away.

So, in an attempt to make this conversation more constructive, what kind of feature would we need in the language to make get_mut compatible with compare_exchange? The soundness argument for compare_exchange relies on the invariant that the content of the AtomicCell is frozen at all times. get_mut allows that invariant to be violated, so we cannot rely on memory to be frozen. Hence we must freeze it ourselves. It seems to me what we would need is an atomic version of ptr::freeze, which we can use to freeze the memory when compare_exchange begins. We also have to make sure that racing writes do not write unfrozen data, but we control that code so that is easy.

So, we could have

fn store(&self, mut val: T) {
  ptr::freeze(&mut val);
  // do the atomic store, *at a type that preserves padding*
}

fn compare_exchange(&self, mut val1: T, mut val2: T) {
  ptr::atomic_freeze(self.value.get());
  ptr::freeze(&mut val1);
  ptr::freeze(&mut val2);
  // do the CAS loop, *at a type that preserves padding*
}

@Amanieu does that make sense to you? I am mostly worried about what it means that we now might perform a "write" operation even if the CAS failed. Alternatively, we could specify that CAS freezes the memory before comparing, but that seems extremely risky given that we have no idea which optimizations LLVM will perform on this.

Amanieu commented 5 years ago

I think an argument could be made that compare_exchange should effectively do the same thing as a volatile load (i.e. it reads a frozen value from the AtomicCell).

In the end I think we should just do whatever C++ does (where get_mut is effectively the inverse of atomic_ref). The LLVM people will figure out the semantics for padding bytes and we should just copy whatever they do. This means that we should probably support something similar to get_mut.

RalfJung commented 5 years ago

I think an argument could be made that compare_exchange should effectively do the same thing as a volatile load (i.e. it reads a frozen value from the AtomicCell).

I was told that LLVM is looking into transformations that will replace atomic operations by non-atomic operations when the compiler can prove that there are no race conditions. That however means that we cannot say that atomic operations read frozen values.

jeehoonkang commented 5 years ago

@RalfJung I still think my previous comment quite clearly summarizes what the standard says about padding (they are just random bits and safe to access), and I think actually it's quite good semantics for everyone. As far as I understand, existing optimizations, as well as system programmer's idioms, are supported. In particular, get_mut() is sound. And it's the status quo.

So to say get_mut() should be unsound, I expect there should be an optimization that renders wrong in the presence of get_mut(), or paddings being safe. Just the existence of a language model (e.g. the freeze model) is, at least for me, not as convincing as a counter-example.

RalfJung commented 5 years ago

I still think my previous comment quite clearly summarizes what the standard says about padding (they are just random bits and safe to access), and I think actually it's quite good semantics for everyone.

Well, it's not how LLVM implements padding, so it's clearly not good for everyone, and certainly not good for Rust which uses LLVM as backend. I have heard your argument, there is no point in repeating it. I just don't agree, this is not how I interpret the C standard. I won't repeat my arguments now as you already know them.

And anyway the C standard is only relevant insofar as we might use it to deduce things about the LLVM semantics. And there I can easily craft examples that show that padding can indeed turn into undef any time:

pub fn test(x: (u32, u64), f: fn(&(u32, u64))) {
    let y = x;
    f(&y);
}

gets compiled to

define void @_ZN10playground4test17hc74ff9e76a3b675aE(i32 %x.0, i64 %x.1, void ({ i32, i64 }*)* nocapture nonnull %f) unnamed_addr #0 {
start:
  %y = alloca { i32, i64 }, align 8
  %0 = bitcast { i32, i64 }* %y to i8*
  call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %0)
  %1 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %y, i64 0, i32 0
  store i32 %x.0, i32* %1, align 8
  %2 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %y, i64 0, i32 1
  store i64 %x.1, i64* %2, align 8
  call void %f({ i32, i64 }* noalias nonnull readonly align 8 dereferenceable(16) %y)
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %0)
  ret void
}

As you can see, this creates a local allocation %y, then copies the two fields into y, and then passes %y to f. In particular, the padding in y does not get initialized, and hence is undef.

This can only be explained in Rust by saying that the let y = x copies the fields of x but resets the padding to undef.

I think this conclusively proves that in the LLVM model, get_mut will let undef creep into Atomic<(u32, u64)>. And since a conditional jump on the result of comparing undef is UB, this means that get_mut is UB unless we have some way to "atomically freeze" memory before compare_exchange.

So to say get_mut() should be unsound, I expect there should be an optimization that renders wrong in the presence of get_mut(), or paddings being safe. Just the existence of a language model (e.g. the freeze model) is, at least for me, not as convincing as a counter-example.

So your definition of "soundness" is "you can't give a counterexample"? I am afraid that is not a definition I can accept. You can't make a function "sound" by messing with it until the current compiler compiles it the way you want. I am honestly shocked to see you even propose this.

I think there is a wide consensus in the Rust community that "soundness" has nothing to do with compiler optimizations. It is about whether your program has UB according to the spec. The spec doesn't exist yet but we can try to infer what pieces of it have to be, like I did in the beginning of this post. You don't get to redefine this term.

Or maybe you gave up on soundness, and only care about this function doing the right thing de facto. That's a fair position, albeit not one I share. What is an interesting question is constructing a sequence of reasonable-looking optimizations that break Atomic<T> in an observable way. However, I am afraid that game is not very fun to play. I think I did the key step above by demonstrating how to put an Undef into an initialized (u32, u64) in safe code.

RalfJung commented 5 years ago

Here's another example demonstrating that the padding of any struct has to be considered undef, no matter how "thoroughly" it got initialized or frozen:

#![crate_type="cdylib"]

#[repr(C)]
pub struct X {
    a: u32,
    b: u64
}

pub fn foo(x: X) -> u32 {
  unsafe {
    return std::ptr::read((&x as *const X as *const u32).add(1));
  }
}

becomes

define i32 @_ZN10playground3foo17hdddbf0857401c2c8E(i32, i64) unnamed_addr #0 {
start:
  ret i32 undef
}

(Thanks to @nagisa.)

RalfJung commented 5 years ago

People reading this might also be interested in the atomig crate. Also see this discussion. Basically that is an entirely safe crate providing an Atomic<T> type, but only where T: Atom. There's no lock-based fallback; the trait just is not implemented for types that are too big. Because the crate is entirely safe it cannot have get_mut, but it does provide all the other operations.

manuthambi commented 3 years ago

If Rust decides to guarantee that padding will always be 0 (in repr(Rust)), then this whole problem goes away for such structs.

See https://github.com/rust-lang/rust/issues/70230

RalfJung commented 3 years ago

That issue is about adding the ability for types to opt-in to guaranteed-0 padding.

There is no proposal on the table that would make all padding always 0, and I doubt people would like that idea.

manuthambi commented 3 years ago

The first line of that issue description is "one option to gain more layout optimizations would be (either for all repr(Rust) types or with per-type opt-in) to use a different kind of padding."

But I agree that it might be hard to do it for all repr(Rust) because of backward compatibility concerns. Thinking of unsafe code initializing a struct field-by-field using pointers.

crossbeam-rs / crossbeam

Soundness of AtomicCell::compare_exchange is dubious #315