Brendonovich / swift-rs

Call Swift functions from Rust with ease!
Apache License 2.0
253 stars 30 forks source link

Issue with pointer being released #8

Closed lorenzolewis closed 1 year ago

lorenzolewis commented 2 years ago

See downstream issue for more details: https://github.com/tauri-apps/tauri/issues/4314

In theory, I think the memory management mechanism being used on the Swift side is the issue.

https://github.com/Brendonovich/swift-rs/blob/1c309f652428ecc549fbe8eaccba4835a97d3aa8/src-swift/lib.swift#L66-L69

It seems like memory is being freed twice.

lorenzolewis commented 2 years ago

Putting this link from @betamos for future reference: https://stackoverflow.com/questions/43127265/the-correct-way-to-manage-cocoa-memory-when-working-with-objective-c-from-rust

amodm commented 1 year ago

This is likely to be fixed via #17

Brendonovich commented 1 year ago

The autorelease branch has a potential fix for this, but I've been sitting on it since I'm not sure if it's the right solution.

The Problem

When creating an object in Swift, it is linked to the application's top-level autorelease pool (or in Tauri's case, WebKit's autorelease pool). Since we use .release() to free Swift objects, the autorelease pools end up attempting to release while the object's retain count is 0, causing a crash.

@amodm's Solution

I have no doubt that this works, but I'm really not a fan of the DX.

My Solution

Seeing as the crash is being caused by an autorelease pool releasing objects with a retain count of 0, I figured why not bypass that pool entirely.

The implementation of this is quite a bit more complex, requiring macro and trait stuff, but imo provides a much better DX. I'm not sure how performant calling objc_autoreleasePoolPush before every function invocation is, but if we can use this solution I'd much prefer it.

amodm commented 1 year ago

It's your crate, so your call obviously, on what approach to take. My thoughts are as follows:

DX & core memory management are orthogonal topics. Memory management has to do with provable correctness, while DX has to do with how devs get to use the crate. The code in the autorelease branch mixes the two, so I'm unable to figure out memory correctness at a quick glance, but you as the author, should certainly try to establish that - through both logical reasoning in your head, as well as rigorous tests. For a crate like this, there's really no way around those two.

Personally, I prefer layering things. the core memory management layer should be simple (and thus easily provable), which is what #17 solves for. The DX layer comes on top of it, and I think the macros are definitely the way forward for that. But the goal of DX should be reducing verbosity, not memory management.

Regarding the DX macros themselves, I'd started working on something similar before I chanced upon this crate, and decided to contribute here instead. So should you want it, we can collaborate on the design of the macros.

amodm commented 1 year ago

I took some more look at the autorelease branch code, and I'm not sure it would solve the problem. My reasoning is that autorelease isn't the root issue of the problem - it's a symptom! The root issue is not adhering to the memory protocols of Swift. So solving this issue by approaching it from an autorelease lens, is at best a duct tape, which will blow up in some or the other scenario. To be clear, I'm making this remark on the basis of logical reasoning, not actual testing of the code, so there could be something I'm missing.

If my previous comment makes sense, the concrete steps forward would be to:

  1. Merge #17 to get the core memory management protocol running. The protocol is simple, clean, and has tests for safety as well as leaks. This also keeps the surface area of "memory aware code" as very thin.
  2. Build the DX macros on top of it, by having them adhere to the protocol above, and having zero memory management code of their own. This keeps the DX layer separate, and allows for future iterations on it without introducing a risk of any memory regression.

A good way to think of the above, is to treat the 1st layer as dealing with the primitives of the crate (SRObject et al), while the 2nd layer as providing a richer API on top of it to deal with common scenarios. This way, even if a dev ends up needing to use the primitives for special requirements, they're less likely to shoot themselves in the foot (as compared to memory management also floating in the 2nd layer).

Also, the 1st step here is time sensitive (I mean fixing the memory issue, not necessarily merging the PR), as in the current form, the crate is broken unfortunately, for all but a small set of usecases. I'm running off my own branch for now, but new users of this crate (from crates.io) may be giving up prematurely, if they hit a SIGSEGV as I did, and not everyone might be coming in to file a bug report.

The 2nd step should involve some inputs, if possible, from the users of the crate, to figure out what kinds of scenarios should the DX solve for. I can create a new issue to get the discussion started, or if you already have a set of usecases, maybe seed the discussion using that.

amodm commented 1 year ago

Reviewing the autorelease branch more, here are my findings:

Pros

  1. The code seems memory safe, and there are no leaks, so long as we use swift!() macro. The name autorelease had me confused initially, but the memory safety here is not coming from autorelease. To verify, you can remove calls to push & pop, and things will still work.
  2. Swift fn signatures don't require a toRust() on return path - this is because we're calling retain on the Rust side instead. Also see the corresponding problem in Cons.1
  3. Swift fn signatures doesn't require a UnsafeRawPointer on argument side - because we're dereferencing the pointer before making the swift call, again via the swift!() macro. See Cons.1 and Cons.2

Cons

  1. The .retain() approach inside swift!() macro, is making an implicit assumption about the implementation of the object being alive by the time it is received on Rust side. While the underlying implementation of Swift's memory management makes this unlikely to change, it's important to acknowledge that this approach is making an assumption, and is using a non-standard contract with Swift's runtime.
  2. Memory safety is not guaranteed via primitives, but the swift! macro call is effectively the new primitive (by virtue of its lifecycle management responsibilities). This means that as a dev, it's no more possible to work outside of swift! macro. Arguably, this also makes memory management a little more opaque.
  3. objc_retain and objc_release are an older relic, while Swift native code should rely on Unmanaged. While the current runtime has enough cross-dependencies between ObjC and Swift, it's an assumption based out of current implementation, and not a future-guaranteed contract we're using.
  4. autoreleasepool push & pop is currently being made on every function call, from inside the swift! implementation. This is not at all required for memory safety, and takes agency away from the dev on what code block to cover under autoreleasepool.

My comments

  1. Personally, I feel more comfortable with not having a macro as a primitive, as it spews lifecycle management code all over the place, and requires tracking that's different from Rust's own object lifecycle management, making it more complex, and thus more easily regressed. But obviously, its the crate owner's discretion to decide what they're more comfortable with.
  2. The advantage of not having toRust() or UnsafeRawPointers in Swift code here is undeniable. I feel that's where some of the potential of macros really comes in, where we use new macros to actually generate the Swift side of the code (the binding part at least), and automatically cover these calls & conversions. This helps keep the core memory management primitives clear & simple, with the additional power of these macros being available to all devs, without compromising the ability of devs to use the lower layers if required.
  3. Implicit assumptions about Swift's runtime should be avoided (Cons 1-3), and public facing APIs or best practices, should be preferred.
  4. autoreleasepool should not be getting invoked for every call. Personally, I'd prefer something like this:

    #[macro_export]
    macro_rules! autoreleasepool {
    ( $expr:expr ) => {{
        extern "C" {
            fn objc_autoreleasePoolPush() -> *mut std::ffi::c_void;
            fn objc_autoreleasePoolPop(context: *mut std::ffi::c_void);
        }
    
        let pool = unsafe { objc_autoreleasePoolPush() };
        let r = { $expr };
        unsafe { objc_autoreleasePoolPop(pool) };
        r
    }};
    }

    This would allow devs to wrap the right amount of code, as deemed appropriate by them, under an autoreleasepool, e.g.

    autoreleasepool!({
    for _ in 0..1000 {
        // do something a 1000 times
    }
    });
amodm commented 1 year ago

@Brendonovich, I've posted my review of the autorelease branch 👆🏼. You can make a decision that feels most appropriate to you, but I'll recommend documenting the reasons either way.

amodm commented 1 year ago

One of the memory-safety checks I'm unable to test on the autorelease branch is loopbacks/reflection: say, I pass an SRString to a Swift function multiple times, and that function returns the exact same SRString back. The reason is that currently there's no way to pass a reference to Swift side in the autorelease branch.

If you think more deeply about it, a fundamental disconnect emerges between programming models of Rust vs Swift. In Rust, we have reference vs value, governed under the ownership model, leading to a syntactical difference in how you pass them around. In Swift OTOH, use of ARC means no syntax difference in reference vs value.

So while it's more Swift-like to take a SRString instead of an UnsafePointer<SRString>, accepting the former comes at the cost of the Rust side sending an SRString instead of an &SRString. This in turn, comes at the cost of Rust side unable to reuse the same instance of a Swift object repeatedly, which actually makes writing Rust side code far more difficult.

As an example, you cannot do something like the following in autorelease branch:

// we create the name object once
let name: SRString = "Bond".into();
for _ in 0..10000 {
    // and then repeatedly use it in a loop
    let reflected = unsafe { reflect_string(&name) };
    assert_eq!(name.as_str(), reflected.as_str());
}

There could be ways of coercing a Rust side reference to a Swift side dereferenced pointer to handle the above scenario, but I suspect it would come at the cost of hidden complexity that's best avoided.

Given the fundamental programming model difference between the two runtimes, it is best that we do not try to hide the complexity at the bridging layer, as it'll make things harder to determine where we're working with reference vs value.

Under normal circumstances, most devs will be iterating far more on their Rust/Swift side code, than the bridging of the two worlds, so that's something to bear in mind too.

FWIW, this memory-safety check works fine on #17.

Brendonovich commented 1 year ago

The reason is that currently there's no way to pass a reference to Swift side in the autorelease branch

Fwiw that's just because I hadn't implemented SwiftArg for &T where T: SwiftArg. If you add this it works:

unsafe impl<T: SwiftArg> SwiftArg for &T {
    type SwiftRsType = *const c_void;
    type SwiftType = *const c_void;

    fn to_swift_rs_type(self) -> Self::SwiftRsType {
        self as *const _ as *const c_void
    }

    fn as_swift_type(rs_type: &Self::SwiftRsType) -> Self::SwiftType {
        *rs_type
    }
}

Am doing more investigating atm, will address the rest of your stuff later

Brendonovich commented 1 year ago

After doing more testing on autorelease I'm left wondering if the only thing holding it together was the __retain call in the swift macro. Every test is working without objc_autoreleasePoolPush/Pop.

I think I'll go ahead with your changes and try and build on top of them. This whole situation has confused me for a long time and seeing as your solution is reliable then I may as well go with it.

amodm commented 1 year ago

FWIW, the two things that allowed the autorelease branch to work correctly:

  1. You fixed the implementation of SRObject::drop call.
  2. You started calling retain() on object receipt on Rust side.

Rest everything was a bit of a distraction from a memory safety perspective. The manner of implementing the retain() is where the problem was, as it didn't strictly adhere to standards, had a larger surface area making memory-management more complex, and had a leakage in the loopback scenario (which is why I was trying to test that) - happy to share the details if that interests you.

While I'm obviously comfortable with the decision, if there's any detail I can share that can bring down the confusion, or clarify on my previous points, I can do that.

I agree that there is a better DX that can/should be built upon a solid foundation of memory safety.

amodm commented 1 year ago

I should also mention that calling autorelease push & pop did help the autorelease branch maintain a lower memory profile in the stress tests. So this is something that should be added on top of #17, though obviously not on a per-call basis.

Something along the lines of the autoreleasepool! macro I shared earlier works, and I can push that to the PR before you merge or after, whichever way you prefer it. This maintains equivalence to the @autoreleasepool of Objective-C, in terms of programming paradigm, and layering on top of memory safety than in, and most importantly giving the choice to devs instead of making it for them.

Pure Swift code doesn't require autorelease pool, but whenever older ObjC libraries get linked, it does help in those scenarios.

Brendonovich commented 1 year ago

Closed by #17