Closed Shnatsel closed 3 years ago
The fact that this is being labeled as "unsound" is saddening. You are pointing to one of many usages of set_len
here, it would require changing half the instances of the &[u8]
in the library to be &[MaybeUninit<u8>]
. That sounds silly. There must be a better way right?
That sounds silly. There must be a better way right?
What makes you think so?
The documentation on .set_len()
explicitly states that you are not allowed to use it that way, and that such use causes undefined behavior. I don't know what other proof do you need.
If the issue is pervasive, then all such code must be fixed.
The possible fixes are:
Vec
as &mut [MaybeUninit<u8>]
and only then call set_len
Vec
via ptr::write
and only then call set_len
push
, extend
, or the newly stabilized extend_from_within
and never call set_len
Vec
before treating it as &mut [u8]
. This can be zero-cost in some cases if lowered to a request for already-zeroed memory from the OS, e.g. using vec![0; len]
.The fact that this is being labeled as "unsound" is saddening. You are pointing to one of many usages of
set_len
here, it would require changing half the instances of the&[u8]
in the library to be&[MaybeUninit<u8>]
. That sounds silly. There must be a better way right?
By the unsafe code guidelines, an API is unsound if it can cause UB without the user writing unsafe
anywhere in their own code.
This API calls Vec::set_len
, which is an unsafe
method. Then, it violates the explicit contract of that method.
An unsafe
method is, by definition, allowed to cause UB if an invariant is not upheld by the caller.
By putting these facts together, we can see that this code does something that can cause UB, and does not require the user to type unsafe
. By definition of soundness in Rust, this code is Unsound.
Thanks for confirming :disappointed:
If the issue is pervasive, then all such code must be fixed.
Sadly it is, the "unsafe" encoder/decoder will try to skip initialization everywhere it can :disappointed:
Isn't this only an actual issue for structured data?
Initializing a Vec<u8>
with random bytes should be a valid initialization.
My understanding is that for invalid initialized data, it can be that the cpu accesses the data before you access it and interprets it, e.g. due to prefetching and reordering and this can cause chaos when the data is in an invalid state.
Uninitialized memory is far trickier than "memory set to a single but arbitrary value". This is discussed in detail in https://www.ralfj.de/blog/2019/07/14/uninit.html
@Shnatsel thanks for the link, this is a really interesting read! Although I don't think this applies for data in a vec since the state machine can't possibly keep track of which parts of a vec are initialized and which not?
The example e.g. works with a vec https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=827b04cca344a35dba5b247e21acd43e
fn always_returns_true(x: u8) -> bool {
x < 120 || x == 120 || x > 120
}
fn main() {
let mut vecco:Vec<u8> = vec![];
unsafe { vecco.set_len(1) };
assert!(always_returns_true(vecco[0]));
}
This is addressed with 0.9.0 https://crates.io/crates/lz4_flex
The following code is unsound because it calls
Vec::set_len()
without initializing the vector first:https://github.com/PSeitz/lz4_flex/blob/053483689099420c8576a378d8bf4e1d1fe2964f/src/block/decompress.rs#L484-L490
The documentation on
set_len()
states:To make this operation sound, you need to treat the spare capacity of the Vec as
MaybeUninit
and only call.set_len()
after all the elements have been initialized.Sadly the convenience method to do so is nightly-only. If you roll your own, you will need to follow the stdlib implementation closely because deconstructing the
Vec
into pointers and especially having a&mut
reference is tricky in terms of aliasing. Fortunately miri can verify its correctness for you.