darfink / region-rs

A cross-platform virtual memory API written in Rust
https://darfink.github.io/region-rs/
MIT License
119 stars 23 forks source link

Why is `unlock` safe? #15

Open sunfishcode opened 3 years ago

sunfishcode commented 3 years ago

I was reviewing the changes between versions 2 and 3 and noticed that the unlock function was changed from unsafe to safe:

https://github.com/darfink/region-rs/commit/02a1d48164efe9eb18adec1362fde6d72a55885a

Since it operates on a raw pointer, and potentially unlocks memory that it doesn't own (in the Rust ownership sense), should it still be unsafe?

darfink commented 3 years ago

The unlock function uses munlock internally (with the corresponding mlock call, on POSIX platforms). Since mlock prevents a region of memory to be paged to a swap area, munlock in turn allows the memory to yet again be paged.

As you surely know, Rust itself has a very clear definition of memory safety; it prevents data races & segmentation faults (i.e. accessing memory that is not owned by the executing code). The unlock function does not violate these principles — paged memory can still be accessed, and does not cause a segmentation fault.

It may not be safe in the general sense (e.g. causing a password/token to be paged to disk), but that is not translatable to Rust's semantics of safe (which unsafe is an indicator of).

Finally, both munlock and VirtualUnlock validate their input, so, e.g, passing in a null pointer (or any pointer for that matter) is still memory safe.

If I'm missing something, or have any misconception regarding the semantics, please feel free to correct me, but from my understanding this is compliant with Rust's rules of memory safety.

burdges commented 3 years ago

It's tricky.. As I understand it, there is only a locked bit per page that acts like a static mut, so if you mlock two allocations that lie in the same page, then the second mlock has no effect, and munlocking either one unlocks both. We thus cannot claim the locked bit state to be well defined. Afaik, we cannot read this locked bit, so maybe this UB cannot propagate, but still..

A priori, I'd expect mlock and munlock to be regarded as intrinsic-like unsafe tools, which users never touch from safe code. Instead, safe code should invoke them via a custom allocator, either an arena or jemalloc distinct pool, which then returns allocations in locked pages. I suspect mmap gives less bad but still technically unsafe semantics because afaik it only allocates whole pages. I've never thought much about this however..

In practice, I suspect you typically only have limited secret material known when processes start so you mlock a couple allocations, which you never munlock, but instead just let the kernel clean up when your process terminates. There is no UB here from Rust's perspective, and anything sensitive stays locked, but some other allocation unnecessarily and randomly fall into the locked page.

sunfishcode commented 3 years ago

@darfink Thanks! I can see how your explanation makes sense.

I think it's a tricky question though, because the entire point of locking memory is to avoid letting some critical data be swapped out, and having functions like munlock be safe means that dangling pointers could lead to it being swapped out. So while there's no UB, there is a hazard involving dangling raw pointers, which is the kind of thing Rust's safety normally protects against.

One way to think about it would be to say that the OS swapping pages out of memory constitutes a form of I/O. That's how someone storing sensitive data in memory and using mlock to prevent it from being swapped out for security reasons might think about it. In that case, munlock could be thought of as a request to the OS to do I/O on the referenced memory, which is effectively dereferencing the memory -- the OS has to read the memory to swap it out. With that interpretation, munlock should be unsafe, because it's "reading" from raw pointers. Does that make sense?