Open parasyte opened 3 years ago
Thanks for the feedback. Using unsafe
so often was a concern of mine going into the project, but seemed ultimately unavoidable.
I will try to get a read through it when I have the time.
Agreed, there would be more elegant solutions for telemetry data (many sims simply spew out the data over UDP) -- but this is the mode they've gone.
Hi! I noticed that there hasn't been a lot of activity in this repo since this issue was opened. Any chance for an update regarding the current status of the crate and future plans? Would you consider this crate not ready for production use at the moment?
Yeah! I really want to use the telemetry for my project. This issue is a blocker, but it can be made safe trivially by copying structs with raw pointers like the C SDK does. In all likelihood, I am going to find some time soon to work on this again (I have an experimental branch locally). But don't let that stop anyone from stepping up and contributing.
I've been away from iRacing and this crate for a little while, my day job has kept me from spending too much time on this project.
Again I agree with all the sugestions here and I thank @parasyte for all his contributions. As the crate has yet to reach version 1.0 I'm open to breaking things if that's what's needed to do things properly.
Seems like we just need to update various methods, primarily in Connection
to create owned copies of the data and work from those rather than relying on the raw data pointers.
I've finally started working on this in earnest on the memory-safety branch. I think I've fixed the safety issues with the handling of the Header
so it now creates an owned copy of the header rather than using the header ptr directly.
I think this might need some further tweaking though as the header still references the ValueBuffer
locations in the underlying mmap
-- Might need a more significant change to copy the entire contents of the shared mmap to an owned location and work from that.
However there's still some work to go in fixing some of the general design issues, such as the public methods which accept raw pointers.
Hi @parasyte. iRacing updating the mapping file underneath the running process is very undefined behavior. I wonder if the FILE_MAP_COPY
option on the MapViewOfFile
win32 function would alleviate that problem? That's how I read the documentation but I'm definitely not sure
I am not aware of how copy-on-write semantics would solve problems with concurrent reads/writes to shared memory. Wouldn't that make writes completely unobservable?
According to MSDN:
When a process writes to a copy-on-write page, the system copies the original page to a new page that is private to the process. ... The contents of the new page are never written back to the original file and are lost when the view is unmapped.
My thinking would be that you'd make another view of the file every time the event is triggered, which I assume would get you the latest written data
Thanks for all the interest here, sorry I've not been very proactive at addressing these or any other issues but I've just not had the time or motivation.
What I'm thinking though is, perhaps this crate is actually all backwards. Currently the design is to interact with the shared memory map as directly as possible and to pull data from it directly in a very similar way to iRacing's own C++ samples.
This approach is probably the most memory and cycle efficient; but its the least safe. I'm sure it should be perfectly possible to copy the header and current buffer for safe operations quickly enough to work.
I'd like to redo this with a more idiomatic interface that makes it easier to use and reduces the number of unsafe operations as far as possible.
I think that is a reasonable approach. It should also be mentioned that an acceptable alternative is exposing an unsafe interface. Anyone with a reason to pay for the extra complexity (and risk) to get zero-copy semantics with a game's shared memory interface shouldn't be prevented just because Rust encourages memory safety.
That said, I don't know what reasons those would be. Nearly every use case I can think of would work fine with memcpy
and allocator overhead...
Don't take this as a negative criticism. There is a lot of content in this issue, which is not meant as an attack on code quality. I am merely attempting to point out how difficult it is to use
unsafe
correctly. And more importantly, I want to eventually use this crate in my own project.I have identified several issues. I'll try to enumerate them, but be aware that this is not an exhaustive audit. Also note that I am not 100% confident in this analysis, since I don't have a great way to rigorously provide proof for each problem. That said, what I have found is convincing on its own merit.
Connection::close()
As a starting point, let's look at the lifetime relationship in
iracing::telemetry::Connection::close()
: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L781-L791This is a public method which borrows
self
with an anonymous lifetime. Immediately after the call toCloseHandle
, theConnection
can no longer be used, because the file handle that it owns has been invalidated. The anonymous lifetime allows us to do just that in safe Rust, however. Here's thedump_sample.rs
example with a single line added to close the connection before getting telemetry from it:This compiles successfully, which is scary. Does this mean that the API allows for undefined behavior? Well, if we run this code, this happens:
And this error occurs because
CloseHandle
is given a pointer to shared memory, not a handle!Connection
needs to maintain a handle reference so that it can close it. The handle is created here inConnection::new
: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L660-L667Ok, so let's fix that:
Now calling
conn.close()
returnsOk(())
as expected. But, oh dear, the code prints out a bunch of info, and that's not at all expected. Or is it?We just closed the
HANDLE
, but it looks like we didn't invalidate the shared memory. Honestly I have no idea how Windows feels about retaining access to shared memory after closing the handle to the file that owns it. This is most likely undefined behavior at this point, and the fact that the documentation forOpenFileMappingW
,MapViewOfFile
, andCloseHandle
doesn't specify the behavior in this state strongly suggests that it is UB.That said, the
close
method is missing an operation; unmapping the file view. So let's properly break this! The modified example code cannot possibly be correct if we do that, right?Now when we run the modified example, we get something horrifying:
Remember, this modified example is all safe code! What happened? The API is in fact not valid. The
close
method cannot safely free kernel resources and allow the caller to continue usingConnection
. There's really only one way to encode a lifetime in the API that means "this struct can never be used again" and that's for theclose
method to consume theself
argument. Essentially you just change the API to:And you're done. Now the modified example does not compile:
There's just one thing about this that is ugly. The fact that calling
close
at all is optional. We could move this code into aDrop
impl so Rust will free the resources whenConnection
goes out of scope. Then we don't need the explicitclose
method, becausedrop(conn);
would be identical toconn.close();
This does not fix all possible ways to leak memory (which is now a well known problem, e.g. the leakpocalypse) but it is a much better API than one which expects callers to remember to call the
Connection::close()
method when it's done with the connection.Blocking::close()
This has the same lifetime problems described above and can also be fixed with
impl Drop
, but it also has its own bugs. Let's look at those here.https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L562-L588
This first thing to notice is that
self.event_handle
cannot be NULL here. Ever. This was verified when the handle was created: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L545-L551And because
Blocking
contains private fields and it doesn't implementClone
orDefault
, it is only possible to create one with the constructor:Blocking::new()
.The second thing to point out is that there is also a NULL check on
self.origin
, but no corresponding NULL check in the constructor. The constructor even uses the pointer without a NULL check: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L537-L538FWIW,
Header::get_var_header()
also doesn't do a NULL check. For a quick detour, theBlocking
constructor is public, it's safe to call, and it accepts a raw pointer. That can't be good. Except there is currently no safe way to construct aHeader
, so making the constructor forBlocking
public is useless. 🤷 In short, all of these NULL checks are pointless and should just be removed.The third thing to notice about this
close
method is this line right here: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L579self.origin
is a raw pointer. It is not a handle. What happens when we call this function?Let's find out!
Well, that explains a lot. But we're allowed to just ignore the
Result
and continue calling methods onBlocking
. In this case, the event handle will be closed and if we try to callblocking.sample()
, it will just return an error that similarly says the handle is closed. So not a great user experience, but I can't find any memory safety issues withBlocking::close()
.However ...
Lifetime violations
Blocking
owns a raw pointer to data owned byConnection
. If we resolve the problems described above by implementingDrop
, we fall into this trap whereBlocking
is allowed to outliveConnection
. For instance:Rust provides a nice way to tie the lifetimes of these structs together. We can make the lifetime of
Blocking
depend on the lifetime ofConnection
. ThusConnection
must always outliveBlocking
.With this patch, we now get a compile error when
Connection
is dropped beforeBlocking
:Connection::read_header()
This is an associated function, so it doesn't need a
Connection
constructed to call it, but it is unsafe. So there is no way to abuse this function in safe code. However, this is still a memory safety violation with this function. https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L702-L707Fun fact: the
transmute
doesn't do anything here.We can ignore the fact that this function takes a raw pointer, and the fact that it is not possible to get a valid shared memory pointer from the API in its present state. The latter fact just means that making this function public is useless. 🤷 But the important detail to consider is the
h.clone()
line. This line violate memory safety becauseHeader
is notCopy
(meaning it cannot be read atomically) and there is no synchronization around the read. This function can read shared memory while another process (iRacing simulator) is writing to it.Some potential outcomes of calling this code are:
Header
.Header
that contains "half updated" values. E.g. maybe some fields are updated while others are old values.Header::header_offset
was changed, but the data it points to was not? Or vice versa?Header::buffers
was partially updated?Header
containing impossible values. This is unlikely becauseHeader
is#[repr(C)]
and only contains integral data and arrays of integral data.The second point is the most concerning. It is also likely because the shared memory is updated every 16.67ms and callers are allowed to call this function (indirectly) whenever they like. If we look at how the iRacing SDK solves this, it will just make us cringe. The SDK is written in C, so it's more forgiving than Rust when it comes to unsynchronized memory access. But this is essentially how it works:
This means that we can be reasonably sure the contents of the buffer are valid. But it says nothing about the validity of the data pointed by the offset within the buffer. I guess they assume clients will always read the data within 3 frames (about 50ms) of the write being completed. I also assume that all of the shared memory updates by the simulator occur rapidly, say entirely within a few microseconds before the event is set. But this synchronization based only on fuzzy timing is awful. (All it takes is some blocking operation like I/O to stall your process for more than 50ms and you are well in UB territory, even with the buffering and read-twice strategy.)
It is probably worth pointing out that this particular issue is a design issue with iRacing, and your crate can do little to improve the situation.
When it comes to Rust, it doesn't allow memory to change out from underneath it. See how unsafe is mmap? on the Rust users forum for a discussion of exactly this problem. TL;DR is that all kinds of bad things can happen with these ingredients, and we know how that will turn out.
Probably the best way to avoid this issue, without getting the iRacing simulator to actually lock memory regions, is removing all code paths that allow arbitrary calls to reading shared memory. In more concrete terms,
Connection::session_info()
andConnection::telemetry()
need to be removed. That leavesConnection::Blocking
as the only interface to shared memory. This provides, at a minimum, using the event as the only way to synchronize access to shared memory. And with careful cloning of all data as soon as possible, the caller can be given a large chunk of memory that will never be changed underneath it.In terms of
Connection::session_info()
, it looks like this can also be synchronized with the event. So at least you don't have to remove a useful feature.General concerns
Public methods should be cleaned up. Methods and fields should be made public only where necessary.
This safe public method accepts a raw pointer: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L369-L372
It doesn't appear to be possible to gain access to a
Header
in safe code (as established earlier) so making this method public seems to be useless. 🤷Blocking::sample()
calls this method on an owned clone ofHeader
, which is the cause of #3. Lots of undefined behavior potential here.Connection::session_info()
does not need to borrowself
mutably or exclusively.This early return leaks file handles from the
OpenFileMappingW
call: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L678The event should not be created. Switch to
OpenEventW
instead: https://github.com/LeoAdamek/iracing.rs/blob/5977462bd55c0dfc7efd1d0b0fe7a12adb3fb5eb/src/telemetry.rs#L545And finally, instances of the
unsafe
keyword can be reduced, especially by wrapping more code surface area in a singleunsafe
block instead of e.g. having 4unsafe
blocks in a single method body.That's most of the issues I have discovered so far with how
unsafe
is used. I don't recommend committing the patches in this issue just yet, because it would create more work for me with #6 reformatting the entire code base, #7 cleaning up a lot of lint (see https://github.com/parasyte/iracing.rs/compare/fix/cargo-fmt...parasyte:fix/check-and-clippy for a comparison diff between the two), and 3 extra as-yet-unpublished branches because they are blocked on those two PRs.And by the way, thank you for this work, so far. It will save me a lot of time reinventing the wheel. And I was able to learn a lot about the IPC interface provided by the iRacing simulator from going through these motions. I wish they would redesign it, but it's unlikely given the number of tools that read telemetry from shared memory. They would probably scoff at the overhead required by locking memory regions, but that's just wild speculation on my part. And it's a rant for another day.