References and pointers

mversic commented 4 years ago

I'd like to find how safe it is to use references in FFI but only as input arguments to extern C functions. Here is the section describing it.

I'm interested in situation where references are provided from Rust code to extern C functions, specifically: foo.rs

extern fn fun(arg: &u32);

fn main() {
    let x: u32 = 4;
    unsafe {
        fun(&x);
    }
}

foo.c

void fun(*int arg) {
    int x = *arg;
    // do something with x
    // what if arg is mutated?
}

extern fun cannot be called from rust with null values. If C function documents that doesn't accept null values this is acceptable. If it is required to be nullable, it can be wrapped in Option.

what if x is mutated inside fun? This seems like a safety issue, maybe an undefined behavior, data corruption or data race. It seems to me that using references is inherently unsafe when interfacing with C code. If that is so, should this section be modified to always advise against use of references in FFI?

what about mutable references &mut? Would it be safe to use those in FFI? Are there possibly some lifetime related issues?

danielhenrymantilla commented 4 years ago

extern "C" {
    fn fun (arg: &'_ [mut] c_int);
}

is basically:

unsafe
fn fun (arg: &'_ [mut] c_int)
{
    extern "C" {
        fn fun (arg: *mut c_int)
    }
    fun(mem::transmute(arg))
}

but where the transmute and the outer function are always "inlined", which requires that &'_ [mut] c_int and *mut c_int have the same ABI.

So, this requires three things:

Same ABI ✅ (guaranteed by the Rust reference)
valid to transmute ✅ (in this case a simple as cast would have sufficed)
safe to use the transmuted value❓

So the main point here is the third one, which indeed depends on what the C code actually does / depends on the guarantees stated in the documentation of the function.

If the pointee is mutated, then it is UB to feed it a pointer that originated from a shared reference to a c_int, since shared references without shared mutability are de facto immutable, a property that Rust does exploit (e.g., assert_eq!(x = 42); fun(&x); assert_eq!(x, 42) most probably ends up with the second equality test being optimized away).

Guaranteed read-only access: &'_ c_int ✅
Mutation: &'_ c_int 🚫

But if the mutation happens in a single-threaded context, it would be valid to feed it a &'_ Cell<c_int>, the Rust equivalent of C++'s int &.

(single-threaded) mutation (within the body of the C function): &'_ Cell<c_int> ✅
- (Obviously if the C function registers the pointer, so as to dereference it after the function returns (by some other function or in some parallel thread), then the '_ would be unsound: the only way to ensure that the call be sound would be by restricting the lifetime of the reference to be 'static).

A fortiori (from &mut T -> &Cell<T> "decay"), feeding a &'_ mut c_int is also fine:

=> &'_ mut c_int ✅
Another way of seeing this, is that &'_ mut c_int (especially in the 'static case) is the pointer type with the strongest guarantees: if you are ever able to forge one and give it to C, then C can, much like Rust, do whatever it pleases with the pointee (except unsynchronized parallel accesses with at least one writer, since that would be a data race, but that's not longer really an FFI problem, it's actually a design flaw specific with the C library).

By changing the signature of the function from a loose input type to a strict subset of these input types, you cannot cause unsoundness.

More about Cell vs. mut

In general, C code can't distinguish between `&'_ Cell` and `&'_ mut c_int`, since `&'_ mut ` is a Rust-specific idiom / abstraction that C does not posess (some compiler extensions may allow some non-aliasing annotations, but in practice they are most broken more often than not, and thus unused). - the only counter-example I could come up with, would be if the C code spawned a background thread mutating the pointee, and returned before finishing: in that case, feeding the C function a `&'static mut c_int` would be sound, whereas feeding it a `&'static Cell` would not.

It seems to me that using references is inherently unsafe when interfacing with C code. If that is so, should this section be modified to always advise against use of references in FFI?

extern "C" functions are always unsafe, if only because of the unchecked linker / loader shenanigans involved, that can lead to some functions shadowing others or other crazy things. In practice:

using &mut _ won't be more unsafe than using *mut _, as explained above,
but care should indeed be taken with immutable references (shared without interior mutability), since this, contrary to C's mild const-correctness, is a paramount property that Rust does exploit.

TL,DR: &mut _ is a subset of mutable * _, but & _ is only a subset of immutable * _, which C has no way of enforcing (you have to trust the C function not to mutate the pointee).

mversic commented 4 years ago

damn, this is a good analysis of the issue. Indeed &Cell can be used since it's repr(transparent).

extern "C" functions are always unsafe, if only because of the unchecked linker / loader shenanigans involved, that can lead to some functions shadowing others or other crazy things. In practice:

yes, of course it's always unsafe to call C functions, worst case C function can always have a SegFault. Even though it must always be unsafe it is possible to reduce unsafety with proper FFI bindings and checks on the rust side.

Can we agree that using immutable reference when calling C function is more unsafe than using raw *const or *mut? Immutable references shouldn't be used for two reasons, mutability issues and concurrency issues. Shared reference can be used in place of raw pointers without introducing more unsafety. As you pointed out &Cell can also be used without additional unsafety.

now about lifetimes. You point out that if C function registers the pointer the lifetime bound wouldn't be sound and in the worst case scenario rust side could deallocate the location of the pointer before C dereferences it. To use 'static would be safer but absurd. If shared reference is used, it has some lifetime attached to it which guarantees that the reference is valid unlike *const and *mut which don't have such a guarantee. It seems to me that using &mut will be more unsafe to use then *mut because a guarantee, which can happen to not be upheld, is given to the compiler. and compiler can depend on it. I can't predict the implications of that but I would say that using raw pointers is more correct(safe?) than using shared references.

mversic commented 4 years ago

as for the "nullable pointer optimization" I wouldn't use for wrapping references for the same reasons I wouldn't use references. They provide guarantees to the compiler which may happen to not be upheld. Function pointers don't have such guarantees and can be used as Option<fn()> without increasing unsafety.

It would be interesting to analyze if Option<NonNull<T>> could be used safely. NonNull implements Copy so no referencing is required which means it will not introduce any lifetime issues, it's both !Send and Sync so it's thread safe, and it has a repr(transparent). I don't know if it will not uphold some other guarantees

danielhenrymantilla commented 4 years ago

Option<NonNull<T>> is equivalent to *const T / *mut T in all things, so that change is perfectly valid and leads to cleaner code, except for the lack of NonNullMut meaning that we lose the const / mut "human annotation" (the only difference between *const and *mut as far as Rust is concerned is variance, which involves generics, and there are no generic extern functions).

I would say that using raw pointers is more correct(safe?) than using shared references.

I'd phrase it as "using Rust references for the signatures of extern C functions may convey a false sense of security, compared to signatures that involve raw pointers exclusively" 🙂

mversic commented 4 years ago

Option<NonNull> is equivalent to const T / mut T in all things, so that change is perfectly valid and leads to cleaner code, except for the lack of NonNullMut meaning that we lose the const / mut "human annotation" (the only difference between const and mut as far as Rust is concerned is variance, which involves generics, and there are no generic extern functions).

yes, "human annotation" is lost. On the other hand you would get type system annotation that a null value can be passed for that particular argument. I can't say which "annotation" is of more value. The thing I don't know is how would I go about creating NonNull<T> before first obtaining raw *mut. I guess it's not possible to go from & or &mut to NonNull directly.

I find that safety-wise Option<NonNull<T>> and *const are equal when used in place of arguments for C functions. They have only value as a documentation intent. Option<NonNull> says that null values can be provided while *const and *mut signify whether pointer value will be modified or not inside the function which can also be a little confusing because *const will not actually guarantee that mutation is prevented.

I'd phrase it as "using Rust references for the signatures of extern C functions may convey a false sense of security, compared to signatures that involve raw pointers exclusively" slightly_smiling_face

yes, much more precise. :)

danielhenrymantilla commented 4 years ago

from & or &mut to NonNull

impl<T : ?Sized> From<&'_ [mut] T> for NonNull<T>

And to go from from *mut T to Option<ptr::NonNull<T>> a simple ::new() suffices.

I agree with you that encoding the nullability is a big win, I was just "complaining" that it is a pity that Rust does not feature an official NonNullMut pointer, much like the following one: https://docs.rs/safer-ffi/0.0.5/safer_ffi/ptr/struct.NonNullMut.html

mversic commented 4 years ago

impl<T : ?Sized> From<&'_ [mut] T> for NonNull

yes exactly, it's a pity there is no such feature in std.

I agree with you that encoding the nullability is a big win, I was just "complaining" that it is a pity that Rust does not feature an official NonNullMut pointer

yes, I understand.

This was far more interesting discussion then I could have I hoped for. My takeaway is this:

it's better to use NonNull<T> if it is documented that C function pointer argument must not be null. That way type system will disallow sending null pointers to C function
otherwise, from the safety point of view it's equally good to use Option<NonNull<T>> or raw *const/*mut
one should refrain from using references in the signatures of extern C functions because they offer stricter guarantees which may or may not be upheld by the extern C function which makes them more unsafe then the alternative

I'm not sure which API do I find more pleasing for nullable pointers, Option<NonNull<T>> or *const/*mut. It really seems a matter of preference and documenting intent. If it were up to you what would you choose?

bjorn3 commented 4 years ago

&T implicitly coerces into *const T and &mut T into *mut T. They do not coerce into Option<NonNull<T>>.

mversic commented 4 years ago

@bjorn3 what you say is true, but I think you misunderstood the point. Please read the discussion.

bjorn3 commented 4 years ago

I meam that using Option<NonNull<T>> instead of *const/mut T prevents you from accidentially passing an &[mut] T, which would have restrictions regarding aliasing and mutability that could result in UB when the C end doesn't follow these rules. Using Option<NonNull<T>> will likely draw some extra attention to this.

mversic commented 4 years ago

@bjorn3 so it was me who misunderstood you :grin:. Sorry, about that.

hm, indeed references will coerce into raw pointers without explicit conversion. I don't think coercion will cause any extra unsafety, it'll still be as safe as using Option<NonNull<T>> because compiler will be aware of the implicit coercion before crossing the FFI boundary. There must always be a point where rust reference is converted to a raw pointer. Again it's the matter of explicitness. This can be pro or a con argument, depending on where you stand. Do you like explicitness or do you like ease of coding?

mversic commented 3 years ago

It's been a while since I've opened this issue. As I've come to learn, there is a trade off to using NonNull.

Consider the following example:

extern "C" {
    fn foo(var: *mut u8) {
        // assume that the given value will be modified
    }
    fn bar(var: NonNull<u8>) {
        // assume that the given value will be modified
    }
}

fn main() {
    foo(&mut 10u8);
    // foo(&10u8);    Luckily this doesn't compile
    bar((&10u8).into());
}

Surprisingly, this code compiles but the call to bar could be an undefined behavior. Undefined behavior will occur if the extern function bar mutates the given NonNull pointer which was created out of a shared reference. Documentation of NonNull::from<&T> clearly states this, but I find that this is such an easy pitfall for the users of the FFI. I find it to too risky to use NonNull in an FFI since there is no guarantee that NonNull will not be mutated and safety-first applications should not assume that.

On the other hand shared reference doesn't coerce into *mut _(you can't even cast &_ as *mut _) and user will at least have to go through some Rust gymnastics to force that conversion like &_ as *const _ as *mut _. This will at least point out that some funky business is at hand here.

I think that this kind of a bug would be a lot more prevalent(and a lot more difficult to locate) compared to null pointer related issues. It's unlikely that Rust users will create null pointers in their applications anyhow. The only situation where a sane Rust user would create null pointer is if it's used as a control flow value in the C function and would avoid null pointers in all other scenarios.

Using NonNull in FFI is dangerous and can introduce hard-to-locate bugs into your client application. I advise against the use of NonNull in FFI.

mversic commented 3 years ago

Let me emphasize this: there is a difference whether you use *const _ or *mut _ pointer in FFI. The difference is that mutating a value behind a *const _ pointer is UB if that pointer was obtained from & _ while mutating a value behind a *mut _ pointer isn't...................... unless your value behind a *const _ pointer is wrapped into UnsafeCell<_>.

NonNull documentation is clear on this:

Notice that NonNull has a From instance for &T. However, this does not change the fact that mutating through a (pointer derived from a) shared reference is undefined behavior unless the mutation happens inside an UnsafeCell. The same goes for creating a mutable reference from a shared reference. When using this From instance without an UnsafeCell, it is your responsibility to ensure that as_mut is never called, and as_ptr is never used for mutation.

It seems that compiler is allowed to depend on the assumption that value behind *const _ will not be modified if that pointer was created out of a shared reference. This assumption should be loosened in a defensively written application which is what UnsafeCell is for.

If you want to make your FFI library as safe as possible, you should prefer using *mut _ pointers where mutation is expected and *const UnsafeCell<_> where it isn't. That is, unless you are absolutely certain what foreign code will do with the given pointer. This may not be the most ergonomic approach, but it will be the safest.

bjorn3 commented 3 years ago

while mutating a value behind a *mut _ pointer isn't

If the *mut _ was derived from an &_, it is still UB to write.

There is no difference between *const T and *mut T during codegen. The only differences are that they are considered different types, there is a difference in the variance of the lifetime parameters in T and you can't directly write to *const T without casting to *mut T first. There are no other differences.

mversic commented 3 years ago

@bjorn3 I'll also link a recently started discussion on this: https://github.com/rust-lang/unsafe-code-guidelines/issues/257 on rust-lang. Would you still disagree?

bjorn3 commented 3 years ago

That issue is only about &mut foo as *const _, which does indeed make writing UB as it generates an intermediate read-only reference. You were talking about FFI.

mversic commented 3 years ago

Wouldn't this be considered UB as well?

main.c

void foo(*int ptr) {
    // due to some bug C code modifies the given pointer
    (int)*ptr = 10;
}

main.rs

extern "C" {
    fn foo(var: *const isize);
}

fn main() {
    let var = 10isize;
    foo(&var);
}

mversic commented 3 years ago

and if we replace *const with *mut as in:

extern "C" {
    fn foo(var: *mut isize);
}

then it wouldn't be UB.

bjorn3 commented 3 years ago

No, it is UB in both cases because you derived the pointer from an read-only reference, not because you used *const isize as pointer type.

danielhenrymantilla commented 3 years ago

Whether a pointer can be used to perform a write-dereference has nothing to do with it being *const or *mut, it has to do with its provenance, i.e., with how the pointer was created / obtained:

pointers yielded by an allocator

… are assumed to be write-dereferenceable for the size of the allocation;
- In some cases the compiler may not be able to know how big that allocation is, but an allocator's contract is to yield non-overlapping spans of memory.
pointers obtained from coercing a Rust reference
- … may read/write dereference for at most the size of the referee (⚠️ this means that the moment you create a Rust reference to part of some bigger struct / allocation, you need to keep the original pointer around to be able to access the rest. For instance, ((&mut [0_u8, 1_u8][0]) as *mut u8).add(1).write(0) is UB)
- … may only perform a write if the Rust reference is a mutable one, i.e.:
  - either an exclusive one (&mut _),
  - or a shared one (& _) to something with a "mutable interior" / showcasing shared mutability (UnsafeCell<_>).
  - thus, if the rust reference is an immutable one, i.e., a shared reference & to something not containing an UnsafeCell, then the pointer does not have "write rights" and thus performing a write with it is UB.

wouldn't be UB

Actually it would, since the Rust code is using a shared reference (&var) where the type of the referee, isize, contains no UnsafeCell and thus &var is an immutable reference.

With foo(&mut var), however, you are right; but I'd blame the current situation with an error-prone coercion, and would expect that Rust, in the future, will not use an intermediary &referee reference in the &mut referee as *const _ coercion.

mversic commented 3 years ago

No, it is UB in both cases because you derived the pointer from an read-only reference, not because you used *const isize as pointer type.

yes, sorry. This wouldn't compile, and would force you to go through &var as *const _ as *mut _ which would also be UB or you would have to make your var mutable in the first place and would thus save yourself from UB

mversic commented 3 years ago

To correct myself, this wouldn't be UB:

main.c

void foo(*int ptr) {
    // due to some bug C code modifies the given pointer
    (int)*ptr = 10;
}

main.rs

extern "C" {
    fn foo(var: *mut isize);
}

fn main() {
    let mut var = 10isize;
    foo(&mut var);
}

which is inconvenient if you want to use shared references throughout your code because you don't expect they'll be mutated.

The point(hopefully correct) I was making is that you can have shared references and still remain protected from UB if you wrap your type in UnsafeCell. This piece of code let's you have shared references and still remain protected from accidental mutations on C side

main.c

void foo(*int ptr) {
    // due to some bug C code modifies the given pointer
    (int)*ptr = 10;
}

main.rs

extern "C" {
    fn foo(var: *const UnsafeCell<isize>);
}

fn main() {
    let var = UnsafeCell::new(10isize);
    foo(&var);
}

though it doesn't look very ergonomic. One could maybe wrap their value into newtype to make it more acceptable for use

mversic commented 3 years ago

With foo(&mut var), however, you are right; but I'd blame the current situation with an error-prone coercion, and would expect that Rust, in the future, will not use an intermediary &referee reference in the &mut referee as *const _ coercion.

ok, I wasn't aware of this but I would refrain from using references in extern function declarations for safe FFI anyhow. This option we don't have to discuss

ANSSI-FR / rust-guide

References and pointers #49

pointers yielded by an allocator

pointers obtained from coercing a Rust reference