gfx-rs / metal-rs

Rust bindings for Metal
Apache License 2.0
591 stars 112 forks source link

Investigate use of objc_retainAutoreleasedReturnValue #222

Open raphlinus opened 2 years ago

raphlinus commented 2 years ago

Autorelease is a major pain point for Apple integration, and users of this library in particular. It forces an uncomfortable tradeoff between performance and safety, and is quite prone to memory leaks if the calling application doesn't scope the autorelease pools properly.

Using Apple-controlled compilers (ie the official Swift bindings), there's a good solution in the form of objc_retainAutoreleasedReturnValue. However, there are challenges for us to use it.

This issue came up in the recent Hacker News discussion of the official C++ Metal bindings. Those bindings also don't have a great story around autorelease (I was hopeful they did and we'd be able to adapt it). See this thread in particular for the autorelease-specific discussion, including very helpful info from @comex.

drewcrawford commented 2 years ago

I have looked into this problem for my crates at some length. The short version is that x64 is “easy enough” if you are willing to very carefully control inlining, etc. But it does require incredibly close coordination between a metal binding and an objc crate (and everything in between - blocks is another common example for block-based APIs) as you need one to directly follow the other in the compiled binary, accounting for optimizations etc. So in terms of “where this would go” the answer is unfortunately “yes”.

arm/etc has the additional complication of requiring particular instruction patterns. Stabilizing asm! would help here. the only solution I know of for stable rust is to emit thunks in an assembler, which introduces additional stack frames and spoils many other optimizations so is pretty undesirable.

In my workloads this is usually not the heaviest rock optimization-wise but once you do lift the heavier rocks it does start to matter.

raphlinus commented 2 years ago

Thanks for the input. And to be clear, I don't consider this an issue of optimization (though that would be nice) so much as avoiding footguns from either memory leaks or potential unsafe patterns. For example, this code is UB (or could easily be adapted into a crashing example):

   let command_buffer = command_queue.new_command_buffer();
   let encoder = autoreleasepool(|| command_buffer.new_compute_command_encoder());
   encoder.set_compute_pipeline_state(...);

I believe if the "new" calls could be written in terms of (a working) objc_retainAutoreleasedReturnValue, then they could return retained references, be safe(r), and not leak memory even when the enclosing code doesn't have autorelease pools properly set.

drewcrawford commented 2 years ago

I think from the UB perspective one solution is to retain the value before it leaves new_compute_command_encoder(). objc_retainAutoreleasedReturnValue is the most optimized retain, but any retain (e.g. objc_retain) would fix the UB and require no special tricks.

From the memory leak perspective, it is implementation-defined whether [foo newComputeCommandEncoder] uses autorelease pools internally somewhere. In that case an autoreleasepool would be required to avoid leaks.

madsmtm commented 2 years ago

Related: https://github.com/gfx-rs/metal-rs/issues/218

Agree with @drewcrawford that the UB issues can be fixed by retaining the object, but as you note, it's impossible to completely ensure that all leaks are avoided without an autoreleasepool because it may use those internally.

Regarding objc_retainAutoreleasedReturnValue, from my quick testing it is definitely possible to update objc so that inlining happens enough for the optimization to kick in (at least on x86_64, when building with --release). But this optimization is indeed quite brittle.

An entirely different approach to fixing this would be to embrace autoreleasing, and instead change objc::rc::autoreleasepool to hand out a reference to the pool, see https://github.com/SSheldon/rust-objc/pull/103. With this, functions such as new_compute_command_encoder could know how to bound the lifetime of the returned reference:

impl CommandBufferRef {
    pub fn new_compute_command_encoder<'p>(&self, _pool: &'p AutoreleasePool) -> &'p ComputeCommandEncoderRef {
        // Maybe the return is bound both by the pool and `'self` if `computeCommandEncoder` expects the
        // commandbuffer to remain alive? I haven't looked that much at how Metal works yet, sorry...
        unsafe { msg_send![self, computeCommandEncoder] }
    }
}

And the example you provided above would then fail to compile:

// Fails at borrow-checker
let command_buffer = command_queue.new_command_buffer();
// Compiler would complain that you can't move `encoder` out of the closure, because it's lifetime is bound to the pool
let encoder = autoreleasepool(|pool| command_buffer.new_compute_command_encoder(pool));
encoder.set_compute_pipeline_state(...);

// No accidental leaking, because you need to have the `pool` from somewhere
let command_buffer = command_queue.new_command_buffer();
let encoder = command_buffer.new_compute_command_encoder(pool);
// If we don't have an autoreleasepool, we can't call the above function
encoder.set_compute_pipeline_state(...);

// Correct usage
let command_buffer = command_queue.new_command_buffer();
autoreleasepool(|pool| {
    let encoder = command_buffer.new_compute_command_encoder(pool);
    encoder.set_compute_pipeline_state(...);
});

Although this absolutely ensures that there are no leaks, it might be pretty cumbersome for the user.

drewcrawford commented 2 years ago

I personally am in favor of a parameter strategy like @madsmtm 's suggestion in spite of it being very cumbersome.

One detail, you really need a pool around every objc call. Because of the "maybe there's autorelease usage internally" question, this includes methods with new

// Correct usage
autoreleasepool(|pool| {
    let command_buffer = command_queue.new_command_buffer(pool);
    let encoder = command_buffer.new_compute_command_encoder(pool);
    encoder.set_compute_pipeline_state(..., pool);
});

The design I used for my bindings is a combination of the retain and autorelease parameter approaches:

  1. Return values are retained smart pointers, as this follows Swift and allows the objc_retainAutoreleasedReturnValue optimization (or it can be added in later). This part fixes the UB but still leaks.
  2. Autoreleasepool parameters statically assert that some pool exists (although the lifetimes are not needed due to 1). This part handles leaks but is not a safety issue which is handled by 1.

These turned out to be a very good tradeoff for my situation.

One trick is there are various cases where one 'knows' an autoreleasepool exists but there isn't a marker type available, for example when ObjC calls you. In that case it is very useful to have an escape hatch to create the marker type for autoreleasepools without any runtime behavior. Depending on details it can be difficult to create a marker type with the 'right' lifetime, so having a design that doesn't rely on lifetimes for safety is nice for this case.

Also, while I am certainly in favor of taking leaks/pools seriously, wider Rust views leaks as safe. Part of the reason I ended up wandering off on my own is I had assumed solutions I preferred like "add a parameter to every API to check leaks" would be impractical within the existing ecosystem, and probably to most users.

madsmtm commented 2 years ago

Depending on details it can be difficult to create a marker type with the 'right' lifetime, so having a design that doesn't rely on lifetimes for safety is nice for this case.

Yeah, I can see what you mean, my proposed design is not very good for that. I just thought it really nice that you could just use the return value as-is, and that you didn't have to do an extra retain. Back to the drawing board (though don't expect me to come up with anything useful 😉)!

Anyhow, though the inner autoreleasing is an implementation detail, maybe we could determine experimentally which functions actually do this? I suspect that could cut down on a lot of what would to users seem like unnecessary work. Of course, this would not be a 100% leak-free situation, e.g. if Apple decides to release an update that changes the implementation details, but it could go a lot of the way there.