LeoAdamek / iracing.rs

Rust library to connect to iRacing telemetry
MIT License
20 stars 4 forks source link

Multiple memory safety violations in this crate #8

Open parasyte opened 3 years ago

parasyte commented 3 years ago

Don't take this as a negative criticism. There is a lot of content in this issue, which is not meant as an attack on code quality. I am merely attempting to point out how difficult it is to use unsafe correctly. And more importantly, I want to eventually use this crate in my own project.

I have identified several issues. I'll try to enumerate them, but be aware that this is not an exhaustive audit. Also note that I am not 100% confident in this analysis, since I don't have a great way to rigorously provide proof for each problem. That said, what I have found is convincing on its own merit.

Connection::close()

As a starting point, let's look at the lifetime relationship in iracing::telemetry::Connection::close(): https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L781-L791

This is a public method which borrows self with an anonymous lifetime. Immediately after the call to CloseHandle, the Connection can no longer be used, because the file handle that it owns has been invalidated. The anonymous lifetime allows us to do just that in safe Rust, however. Here's the dump_sample.rs example with a single line added to close the connection before getting telemetry from it:

use iracing::telemetry::Connection;
use serde_yaml::to_string;

pub fn main() {
    let conn = Connection::new().expect("Unable to open telemetry");
    conn.close().unwrap();
    let telem = conn.telemetry().expect("Telem");

    print!("{}", to_string(&telem.all()).unwrap());
}

This compiles successfully, which is scary. Does this mean that the API allows for undefined behavior? Well, if we run this code, this happens:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 6, kind: Uncategorized, message: "The handle is invalid." }', examples\safety.rs:6:18

And this error occurs because CloseHandle is given a pointer to shared memory, not a handle! Connection needs to maintain a handle reference so that it can close it. The handle is created here in Connection::new: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L660-L667

Ok, so let's fix that:

diff --git a/src/telemetry.rs b/src/telemetry.rs
index 2bc1dde..15a7b05 100644
--- a/src/telemetry.rs
+++ b/src/telemetry.rs
@@ -644,6 +644,7 @@ impl Blocking {
 /// ```
 pub struct Connection {
     mux: Mutex<()>,
+    mapping: HANDLE,
     location: *mut c_void,
     header: Header
 }
@@ -680,7 +681,7 @@ impl Connection {

         let header = unsafe { Self::read_header(view) };

-        return Ok(Connection {mux: Mutex::default(), location: view, header: header});
+        return Ok(Connection {mux: Mutex::default(), mapping, location: view, header: header});
     }

     ///
@@ -779,7 +780,7 @@ impl Connection {
     }

     pub fn close(&self) -> IOResult<()> {
-        let succ = unsafe { CloseHandle(self.location) };
+        let succ = unsafe { CloseHandle(self.mapping) };

         if succ != 0 {
             Ok(())

Now calling conn.close() returns Ok(()) as expected. But, oh dear, the code prints out a bunch of info, and that's not at all expected. Or is it?

We just closed the HANDLE, but it looks like we didn't invalidate the shared memory. Honestly I have no idea how Windows feels about retaining access to shared memory after closing the handle to the file that owns it. This is most likely undefined behavior at this point, and the fact that the documentation for OpenFileMappingW, MapViewOfFile, and CloseHandle doesn't specify the behavior in this state strongly suggests that it is UB.

That said, the close method is missing an operation; unmapping the file view. So let's properly break this! The modified example code cannot possibly be correct if we do that, right?

diff --git a/src/telemetry.rs b/src/telemetry.rs
index 2bc1dde..8aa64d8 100644
--- a/src/telemetry.rs
+++ b/src/telemetry.rs
@@ -29,7 +29,7 @@ use std::sync::Mutex;
 use encoding::{Encoding, DecoderTrap};
 use encoding::all::ISO_8859_1;
 use winapi::shared::minwindef::LPVOID;
-use winapi::um::memoryapi::{OpenFileMappingW, FILE_MAP_READ, MapViewOfFile};
+use winapi::um::memoryapi::{OpenFileMappingW, FILE_MAP_READ, MapViewOfFile, UnmapViewOfFile};

 /// System path where the shared memory map is located.
@@ -644,6 +644,7 @@ impl Blocking {
 /// ```
 pub struct Connection {
     mux: Mutex<()>,
+    mapping: HANDLE,
     location: *mut c_void,
     header: Header
 }
@@ -680,7 +681,7 @@ impl Connection {

         let header = unsafe { Self::read_header(view) };

-        return Ok(Connection {mux: Mutex::default(), location: view, header: header});
+        return Ok(Connection {mux: Mutex::default(), mapping, location: view, header: header});
     }

     ///
@@ -779,15 +780,22 @@ impl Connection {
     }

     pub fn close(&self) -> IOResult<()> {
-        let succ = unsafe { CloseHandle(self.location) };
+        let succ = unsafe { CloseHandle(self.mapping) };

-        if succ != 0 {
-            Ok(())
-        } else {
+        if succ == 0 {
             let errno: i32 = unsafe { GetLastError() as i32 };
+            UnmapViewOfFile(self.location); // XXX: Ignore return value; we are already handling an error.
+            return Err(std::io::Error::from_raw_os_error(errno));
+        }

-            Err(std::io::Error::from_raw_os_error(errno))
+        let succ = unsafe { UnmapViewOfFile(self.location) };
+
+        if succ == 0 {
+            let errno: i32 = unsafe { GetLastError() as i32 };
+            return Err(std::io::Error::from_raw_os_error(errno));
         }
+
+        Ok(())
     }
 }

Now when we run the modified example, we get something horrifying:

error: process didn't exit successfully: `target\debug\examples\safety.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
Segmentation fault

Remember, this modified example is all safe code! What happened? The API is in fact not valid. The close method cannot safely free kernel resources and allow the caller to continue using Connection. There's really only one way to encode a lifetime in the API that means "this struct can never be used again" and that's for the close method to consume the self argument. Essentially you just change the API to:

pub fn close(self) -> IOResult<()>

And you're done. Now the modified example does not compile:

error[E0382]: borrow of moved value: `conn`
   --> examples\safety.rs:7:17
    |
5   |     let conn = Connection::new().expect("Unable to open telemetry");
    |         ---- move occurs because `conn` has type `Connection`, which does not implement the `Copy` trait
6   |     conn.close().unwrap();
    |          ------- `conn` moved due to this method call
7   |     let telem = conn.telemetry().expect("Telem");
    |                 ^^^^ value borrowed here after move
    |
note: this function takes ownership of the receiver `self`, which moves `conn`
   --> C:\Users\jay\other-projects\iracing.rs\src\telemetry.rs:782:18
    |
782 |     pub fn close(self) -> IOResult<()> {
    |                  ^^^^

There's just one thing about this that is ugly. The fact that calling close at all is optional. We could move this code into a Drop impl so Rust will free the resources when Connection goes out of scope. Then we don't need the explicit close method, because drop(conn); would be identical to conn.close();

This does not fix all possible ways to leak memory (which is now a well known problem, e.g. the leakpocalypse) but it is a much better API than one which expects callers to remember to call the Connection::close() method when it's done with the connection.

Blocking::close()

This has the same lifetime problems described above and can also be fixed with impl Drop, but it also has its own bugs. Let's look at those here.

https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L562-L588

This first thing to notice is that self.event_handle cannot be NULL here. Ever. This was verified when the handle was created: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L545-L551

And because Blocking contains private fields and it doesn't implement Clone or Default, it is only possible to create one with the constructor: Blocking::new().

The second thing to point out is that there is also a NULL check on self.origin, but no corresponding NULL check in the constructor. The constructor even uses the pointer without a NULL check: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L537-L538

FWIW, Header::get_var_header() also doesn't do a NULL check. For a quick detour, the Blocking constructor is public, it's safe to call, and it accepts a raw pointer. That can't be good. Except there is currently no safe way to construct a Header, so making the constructor for Blocking public is useless. 🤷 In short, all of these NULL checks are pointless and should just be removed.

The third thing to notice about this close method is this line right here: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L579

self.origin is a raw pointer. It is not a handle. What happens when we call this function?

use iracing::telemetry::Connection;

pub fn main() {
    let conn = Connection::new().expect("Unable to open telemetry");
    let blocking = conn.blocking().expect("Couldn't create telem handle");
    blocking.close().unwrap();
}

Let's find out!

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 6, kind: Uncategorized, message: "The handle is invalid." }', examples\safety.rs:6:22

Well, that explains a lot. But we're allowed to just ignore the Result and continue calling methods on Blocking. In this case, the event handle will be closed and if we try to call blocking.sample(), it will just return an error that similarly says the handle is closed. So not a great user experience, but I can't find any memory safety issues with Blocking::close().

However ...

Lifetime violations

Blocking owns a raw pointer to data owned by Connection. If we resolve the problems described above by implementing Drop, we fall into this trap where Blocking is allowed to outlive Connection. For instance:

let conn = Connection::new().expect("Unable to open telemetry");
let blocking = conn.blocking().expect("Couldn't create telem handle");
drop(conn);
blocking.sample(Duration::from_millis(500)).unwrap(); // THIS ACCESSES FREED MEMORY

Rust provides a nice way to tie the lifetimes of these structs together. We can make the lifetime of Blocking depend on the lifetime of Connection. Thus Connection must always outlive Blocking.

diff --git a/src/telemetry.rs b/src/telemetry.rs
index 2bc1dde..269f865 100644
--- a/src/telemetry.rs
+++ b/src/telemetry.rs
@@ -7,6 +7,7 @@ use std::ffi::CStr;
 use std::fmt::Display;
 use std::fmt;
 use std::os::windows::raw::HANDLE;
+use std::marker::PhantomData;
 use std::ptr::null;
 use std::slice::from_raw_parts;
 use std::time::Duration;
@@ -67,11 +68,12 @@ pub struct Header {
 ///
 /// Calling `sample()` on a Blocking interface will block until a new telemetry sample is made available.
 ///
-pub struct Blocking {
+pub struct Blocking<'conn> {
     origin: *const c_void,
     values: Vec<ValueHeader>,
     header: Header,
-    event_handle: HANDLE
+    event_handle: HANDLE,
+    phantom: PhantomData<&'conn Connection>,
 }

 #[derive(Copy, Clone, Debug)]
@@ -533,7 +535,7 @@ impl Display for TelemetryError {
 impl Error for TelemetryError {
 }

-impl Blocking {
+impl<'conn> Blocking<'conn> {
     pub fn new(location: *const c_void, head: Header) -> std::io::Result<Self> {
         let values = head.get_var_header(location).to_vec();

@@ -555,7 +557,8 @@ impl Blocking {
            origin: location,
            header: head,
            values: values,
-           event_handle: handle
+           event_handle: handle,
+           phantom: PhantomData,
         })
     }

@@ -774,7 +777,7 @@ impl Connection {
     /// let sampler = Connection::new()?.blocking()?;
     /// let sample = sample.sample(Duration::from_millis(50))?;
     /// ```
-    pub fn blocking(&self) -> IOResult<Blocking> {
+    pub fn blocking(&self) -> IOResult<Blocking<'_>> {
         Blocking::new(self.location, unsafe { Self::read_header(self.location) })
     }

With this patch, we now get a compile error when Connection is dropped before Blocking:

error[E0505]: cannot move out of `conn` because it is borrowed
 --> examples\safety.rs:7:10
  |
6 |     let blocking = conn.blocking().expect("Couldn't create telem handle");
  |                    ---- borrow of `conn` occurs here
7 |     drop(conn);
  |          ^^^^ move out of `conn` occurs here
8 |     blocking.sample(Duration::from_millis(500)).unwrap();
  |     -------- borrow later used here

Connection::read_header()

This is an associated function, so it doesn't need a Connection constructed to call it, but it is unsafe. So there is no way to abuse this function in safe code. However, this is still a memory safety violation with this function. https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L702-L707

Fun fact: the transmute doesn't do anything here.

We can ignore the fact that this function takes a raw pointer, and the fact that it is not possible to get a valid shared memory pointer from the API in its present state. The latter fact just means that making this function public is useless. 🤷 But the important detail to consider is the h.clone() line. This line violate memory safety because Header is not Copy (meaning it cannot be read atomically) and there is no synchronization around the read. This function can read shared memory while another process (iRacing simulator) is writing to it.

Some potential outcomes of calling this code are:

  1. It returns a valid Header.
  2. It returns a valid Header that contains "half updated" values. E.g. maybe some fields are updated while others are old values.
    • There are a lot of concerns here.
    • What if Header::header_offset was changed, but the data it points to was not? Or vice versa?
    • What if Header::buffers was partially updated?
    • Etc.
  3. It returns an invalid Header containing impossible values. This is unlikely because Header is #[repr(C)] and only contains integral data and arrays of integral data.

The second point is the most concerning. It is also likely because the shared memory is updated every 16.67ms and callers are allowed to call this function (indirectly) whenever they like. If we look at how the iRacing SDK solves this, it will just make us cringe. The SDK is written in C, so it's more forgiving than Rust when it comes to unsynchronized memory access. But this is essentially how it works:

  1. Read the current tick from the selected buffer, remember the value.
  2. Copy the buffer from shared memory to its final location.
  3. Read the current tick from the selected buffer and compare with the remembered value.
    • If the values are equivalent, the copied data is considered valid.
    • Otherwise the data is discarded and the whole process is tried again, up to a total of 2 attempts.

This means that we can be reasonably sure the contents of the buffer are valid. But it says nothing about the validity of the data pointed by the offset within the buffer. I guess they assume clients will always read the data within 3 frames (about 50ms) of the write being completed. I also assume that all of the shared memory updates by the simulator occur rapidly, say entirely within a few microseconds before the event is set. But this synchronization based only on fuzzy timing is awful. (All it takes is some blocking operation like I/O to stall your process for more than 50ms and you are well in UB territory, even with the buffering and read-twice strategy.)

It is probably worth pointing out that this particular issue is a design issue with iRacing, and your crate can do little to improve the situation.

When it comes to Rust, it doesn't allow memory to change out from underneath it. See how unsafe is mmap? on the Rust users forum for a discussion of exactly this problem. TL;DR is that all kinds of bad things can happen with these ingredients, and we know how that will turn out.

Probably the best way to avoid this issue, without getting the iRacing simulator to actually lock memory regions, is removing all code paths that allow arbitrary calls to reading shared memory. In more concrete terms, Connection::session_info() and Connection::telemetry() need to be removed. That leaves Connection::Blocking as the only interface to shared memory. This provides, at a minimum, using the event as the only way to synchronize access to shared memory. And with careful cloning of all data as soon as possible, the caller can be given a large chunk of memory that will never be changed underneath it.

In terms of Connection::session_info(), it looks like this can also be synchronized with the event. So at least you don't have to remove a useful feature.

General concerns

Public methods should be cleaned up. Methods and fields should be made public only where necessary.

This safe public method accepts a raw pointer: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L369-L372

It doesn't appear to be possible to gain access to a Header in safe code (as established earlier) so making this method public seems to be useless. 🤷

Blocking::sample() calls this method on an owned clone of Header, which is the cause of #3. Lots of undefined behavior potential here.

Connection::session_info() does not need to borrow self mutably or exclusively.

This early return leaks file handles from the OpenFileMappingW call: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L678

The event should not be created. Switch to OpenEventW instead: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L545

And finally, instances of the unsafe keyword can be reduced, especially by wrapping more code surface area in a single unsafe block instead of e.g. having 4 unsafe blocks in a single method body.


That's most of the issues I have discovered so far with how unsafe is used. I don't recommend committing the patches in this issue just yet, because it would create more work for me with #6 reformatting the entire code base, #7 cleaning up a lot of lint (see https://github.com/parasyte/iracing.rs/compare/fix/cargo-fmt...parasyte:fix/check-and-clippy for a comparison diff between the two), and 3 extra as-yet-unpublished branches because they are blocked on those two PRs.

And by the way, thank you for this work, so far. It will save me a lot of time reinventing the wheel. And I was able to learn a lot about the IPC interface provided by the iRacing simulator from going through these motions. I wish they would redesign it, but it's unlikely given the number of tools that read telemetry from shared memory. They would probably scoff at the overhead required by locking memory regions, but that's just wild speculation on my part. And it's a rant for another day.

LeoAdamek commented 3 years ago

Thanks for the feedback. Using unsafe so often was a concern of mine going into the project, but seemed ultimately unavoidable. I will try to get a read through it when I have the time.

Agreed, there would be more elegant solutions for telemetry data (many sims simply spew out the data over UDP) -- but this is the mode they've gone.

sunaurus commented 2 years ago

Hi! I noticed that there hasn't been a lot of activity in this repo since this issue was opened. Any chance for an update regarding the current status of the crate and future plans? Would you consider this crate not ready for production use at the moment?

parasyte commented 2 years ago

Yeah! I really want to use the telemetry for my project. This issue is a blocker, but it can be made safe trivially by copying structs with raw pointers like the C SDK does. In all likelihood, I am going to find some time soon to work on this again (I have an experimental branch locally). But don't let that stop anyone from stepping up and contributing.

LeoAdamek commented 2 years ago

I've been away from iRacing and this crate for a little while, my day job has kept me from spending too much time on this project.

Again I agree with all the sugestions here and I thank @parasyte for all his contributions. As the crate has yet to reach version 1.0 I'm open to breaking things if that's what's needed to do things properly.

Seems like we just need to update various methods, primarily in Connection to create owned copies of the data and work from those rather than relying on the raw data pointers.

LeoAdamek commented 2 years ago

I've finally started working on this in earnest on the memory-safety branch. I think I've fixed the safety issues with the handling of the Header so it now creates an owned copy of the header rather than using the header ptr directly.

I think this might need some further tweaking though as the header still references the ValueBuffer locations in the underlying mmap -- Might need a more significant change to copy the entire contents of the shared mmap to an owned location and work from that.

However there's still some work to go in fixing some of the general design issues, such as the public methods which accept raw pointers.

Herbstein commented 3 weeks ago

Hi @parasyte. iRacing updating the mapping file underneath the running process is very undefined behavior. I wonder if the FILE_MAP_COPY option on the MapViewOfFile win32 function would alleviate that problem? That's how I read the documentation but I'm definitely not sure

parasyte commented 3 weeks ago

I am not aware of how copy-on-write semantics would solve problems with concurrent reads/writes to shared memory. Wouldn't that make writes completely unobservable?

According to MSDN:

When a process writes to a copy-on-write page, the system copies the original page to a new page that is private to the process. ... The contents of the new page are never written back to the original file and are lost when the view is unmapped.

Herbstein commented 3 weeks ago

My thinking would be that you'd make another view of the file every time the event is triggered, which I assume would get you the latest written data

LeoAdamek commented 2 weeks ago

Thanks for all the interest here, sorry I've not been very proactive at addressing these or any other issues but I've just not had the time or motivation.

What I'm thinking though is, perhaps this crate is actually all backwards. Currently the design is to interact with the shared memory map as directly as possible and to pull data from it directly in a very similar way to iRacing's own C++ samples.

This approach is probably the most memory and cycle efficient; but its the least safe. I'm sure it should be perfectly possible to copy the header and current buffer for safe operations quickly enough to work.

I'd like to redo this with a more idiomatic interface that makes it easier to use and reduces the number of unsafe operations as far as possible.

parasyte commented 2 weeks ago

I think that is a reasonable approach. It should also be mentioned that an acceptable alternative is exposing an unsafe interface. Anyone with a reason to pay for the extra complexity (and risk) to get zero-copy semantics with a game's shared memory interface shouldn't be prevented just because Rust encourages memory safety.

That said, I don't know what reasons those would be. Nearly every use case I can think of would work fine with memcpy and allocator overhead...