Manishearth / rust-gc

Simple tracing (mark and sweep) garbage collector for Rust
Mozilla Public License 2.0
963 stars 50 forks source link

GC isn't closure friendly. #50

Open kevincox opened 7 years ago

kevincox commented 7 years ago

This might be very difficult but I think it is worth considering how the GC can work with closures.

Currently moving a Gc into a closure causes problems because it can't be found in the trace. Also it will probably cause a panic on cleanup if the closure is stored in a GCed object because it will be rooted which will cause assert!(finalizer_safe()) to trigger.

I don't know if it would be possible to add support for tracing closures, I'm pretty sure this isn't possible in current Rust but maybe a compiler plugin could accomplish it. That being said some "tracing" traits aren't implemented for closure (Debug) but that might be for another reason.

There are other problems as well. Due to moving being so transparent there is no way to mark a "field" as #[unsafe_ignore_trace] if you want to capture a non-Gc containing object as well. This seems like it would make it impossible to choose the right "default".

That being said it would be very nice to be able to close over values and have them automatically traced if something could be worked out.

apasel422 commented 7 years ago

Could you post a code example that causes problems this way?

Manishearth commented 7 years ago

GCd objects don't get unrooted when put inside closures. They also won't get unrooted if the closure is put inside another GCd object, and you can cause a leak if you make a cycle here. The worse case here is a leak. You don't need to trace down closures for safety because all their contents have already been rooted.

A Gc<T> only unroots when it is placed transitively inside another Gc<T>. The operation of moving a Gc<T> into a closure doesn't do this, so it doesn't unroot.

mystor commented 7 years ago

It would be awesome if we could traverse closure implementations, but I don't think that that will be possible until the rust language itself adds tracing infrastructure, as there is no place in the pipeline where we can hook in right now which gives us the information we would need (what variables are captured by the closure), but also allows us to implement traits.

kevincox commented 7 years ago

Exactly what @Manishearth said. You stay rooted inside closures.

However this can do more then just cause cycles. If the closure is inside another Gc object and get's Dropped it will cause a panic because the captured reference will be Dropped as well, causing the assertion I mentioned to fire.

@mystor Is there any work in the language for tracing? I know that Servo did something similar for their GC so maybe I'll try to look into that to see if they are using any tricks. But yes, I think you're right in that we will need to wait for better tracing support.

Manishearth commented 7 years ago

http://manishearth.github.io/blog/2016/08/18/gc-support-in-rust-api-design/

izgzhen commented 7 years ago

Do we have a RFC in rust-lang side related to this now?

andersk commented 3 years ago

Currently moving a Gc into a closure causes problems because it can't be found in the trace. Also it will probably cause a panic on cleanup if the closure is stored in a GCed object because it will be rooted which will cause assert!(finalizer_safe()) to trigger.

These issues are just a consequence of the incorrect #[unsafe_ignore_trace] that must have been used in order to store the closure in a Gc, right? An incorrect #[unsafe_ignore_trace] can cause these issues without any closures involved, too.

It’s certainly a limitation that there seems to be no way to correctly implement Trace for a struct with a closure containing a Gc. But at least one should not accidentally get an incorrect implementation without a visible “unsafe”.

Manishearth commented 3 years ago

Correct, this is still safe.

My recommendation woudl be having a second trait that is : Trace that you use trait objects of

kevincox commented 3 years ago

These issues are just a consequence of the incorrect #[unsafe_ignore_trace]

Yes. But there is currently no way to trace a closure. Maybe a better name for this issue is "Allow tracing through closures."

I agree that the user should be aware of this to some degree. However it is quite easy to start with a closure that doesn't capture any Gc references but then later you add one. When you start capturing this reference you don't even necessarily notice (maybe the type Gc isn't spelt out in the nearby code) and the unsafe_ignore_trace is in a different file. So while the user does have to opt into this is is very likely that the user makes a mistake later and ends up in this situation because of how implicit closure captures are.

andersk commented 3 years ago

Since a closure type cannot be named, it can only be contained in a generic struct, which is unsafe if there exists any unsafe instantiation of it. This is unsafe because it’s possible to misuse unsafely:

#[derive(Finalize, Trace)]
struct Closure<F: Fn()>(#[unsafe_ignore_trace] F);

so regardless of whether it’s actually misused, the author is responsible for ensuring that it’s not exposed as part of a safe API. It shares all the same considerations with a function like this:

fn convert<T, U>(value: &T) -> &U {
    unsafe { std::mem::transmute(value) }
}

which could be used correctly, but even a correct use could become incorrect after some nonlocal code change. Obviously, the author of the convert function is responsible for marking it unsafe and documenting its safety invariant. Similarly, the author of struct Closure is responsible for making its constructor private, wrapping it in a factory function that’s marked unsafe, and documenting its safety invariant. The call site in both cases is then required to use an unsafe block, whose presence locally alerts programmers that there are safety invariants to be maintained when making changes.

Alternatively, depending on the use case, it might be possible to restrict the struct such that it can only be used safely:

#[derive(Finalize, Trace)]
struct Closure<F: Copy + Fn()>(#[unsafe_ignore_trace] F);
gagbo commented 2 years ago

Hello,

I'm trying to implement a scheme interpreter using this crate and the Copy restriction doesn't work for me. Currently I'm trying to bundle in the same structure the closure and all the captures that happened in the closure, hoping that the extra fields would act as proper roots for the closure:

type ManagedEnv = Gc<GcCell<Env>>;
type ManagedValue = Gc<GcCell<Value>>;
trait Analyze: Fn(ManagedEnv) -> ManagedValue {}

#[derive(Trace, Finalize)]
pub struct Analyzed {
    /// The manually tracked list of roots that are used/captured by the closure
    ///
    /// This list does not contain all the roots that are indirectly captured by
    /// the thunks included in the analysis, since they are captured in [thunks]
    /// already.
    roots: Vec<R7Result>,

    /// The manually tracked list of thunks that are used/captured by the closure
    thunks: Vec<Gc<Analyzed>>,

    /// It is safe to ignore tracing because of the roots and the thunks are
    /// included in the struct, so those other fields will be responsible of
    /// tracing and it is okay to lose tracing information from the closure
    /// capture that happens at compilation time.
    ///
    /// see https://github.com/Manishearth/rust-gc/issues/50 for reasons why
    /// closures cannot automagically root the Gc'd values that get moved into
    /// it.
    #[unsafe_ignore_trace]
    fun: Box<dyn Analyze>,
}

Of course, as it's my first implementation, I'm still encountering bugs, but I'd like to know if that seems obviously wrong of if that's something that could work ? I guess the worst case scenario is when the closure is creating some ManagedValues in the body, but I thought that those values would not matter as they either are temporary (local to the closure execution), or returned outside of the closure, where they could be properly tracked.

Manishearth commented 2 years ago

@gagobo Wich Copy restriction? I don't think we have one.

Manishearth commented 2 years ago

The following code compiles just fine for me:

```rust use gc::*; type ManagedEnv = Gc>; type ManagedValue = Gc>; trait Analyze: Fn(ManagedEnv) -> ManagedValue {} #[derive(Trace, Finalize)] struct Env; #[derive(Trace, Finalize)] struct Value; #[derive(Trace, Finalize)] struct R7Result; #[derive(Trace, Finalize)] pub struct Analyzed { /// The manually tracked list of roots that are used/captured by the closure /// /// This list does not contain all the roots that are indirectly captured by /// the thunks included in the analysis, since they are captured in [thunks] /// already. roots: Vec, /// The manually tracked list of thunks that are used/captured by the closure thunks: Vec>, /// It is safe to ignore tracing because of the roots and the thunks are /// included in the struct, so those other fields will be responsible of /// tracing and it is okay to lose tracing information from the closure /// capture that happens at compilation time. /// /// see https://github.com/Manishearth/rust-gc/issues/50 for reasons why /// closures cannot automagically root the Gc'd values that get moved into /// it. #[unsafe_ignore_trace] fun: Box, } ```
gagbo commented 2 years ago

I meant the Copy restriction in the Closure example earlier in the conversation.

If it looks fine to you then all is good (and you can ignore the rest of the post, it's less "issue relevant" but more "newbie conversation")

I implemented the code (mostly) as you put just above, and it compiles fine but I started triggering finalizer_safe panics (when calling two functions in a row in my scheme repl, so very not minimal example) and I wasn't sure if it was because my design is unsound to begin with, or if it's "just me" mis-representing the traceable fields in my application code when building the Analyzed instances. Even after reading your blog post the exact semantics of Trace and (especially) Finalize weren't super clear to me, so adding the closure capture management on top of those put me in total darkness, but that's how we learn :)

If I ever manage to build a MWE that can trigger Finalize panics with this struct I will add it here to show what I mean.

And thanks for making this crate, it might not look like it right now, but it does save me a lot of headaches

Manishearth commented 2 years ago

Yeah so like I said the way to do closures soundly is to have trait Closure: Trace and use something like Gc<dyn Closure> or whatever (you may need some further plumbing to make that work). But you basically have to manually define closure capture types and implement a trait on them that has a .call() method.

gagbo commented 2 years ago

I see what you mean.

For the sake of completeness, and not being a denvercoder9, this is a stub of the design I'm going with, where indeed the builtin closure types just have to be banished to avoid issues.

use gc::*

#[derive(Trace, Finalize)]
struct Env{}
#[derive(Trace, Finalize)]
struct Val{}
type ManagedVal = Gc<GcCell<Val>>;
type ManagedEnv = Gc<GcCell<Env>>;

// I don't want to deal with generic traits around arguments and return types,
// I'm only interested in one type of closure, therefore the trait is very concrete

pub trait CloseOnEnv: Trace {
    fn call(&self, env: ManagedEnv) -> ManagedVal;
}

#[derive(Trace, Finalize)]
pub struct Closure {
    // Maybe the box and extra type is unnecessary,
    // I only keep them for the easy alias
    inner: Box<dyn CloseOnEnv>,
}
pub type ManagedClosure = Gc<Closure>;

impl CloseOnEnv for Closure {
    fn call(&self, env: ManagedEnv) -> ManagedVal {
        self.inner.call(env)
    }
}

impl Closure {
    pub fn managed<T: CloseOnEnv + 'static>(inner: T) -> ManagedClosure {
        Gc::new(Self {
            inner: Box::new(inner),
        })
    }
}

// Example of usage:

// No capture needed
impl CloseOnEnv for ManagedVal {
    fn call(&self, _env: ManagedEnv) -> ManagedVal {
        self.clone()
    }
}

// Capture needed, making a combinator that executes a sequence of closures passed as arguments
fn sequence<T>(analyses: T) -> ManagedClosure
where
    T: Iterator<Item = ManagedClosure>,
{
    // A local struct with capture semantics that is going to be returned, and
    // that has proper semantics
    #[derive(Trace, Finalize)]
    struct Analyzed {
        parts: Vec<ManagedClosure>,
    }

    impl CloseOnEnv for Analyzed {
        fn call(&self, env: ManagedEnv) -> ManagedVal {
            let mut folded: ManagedVal = ManagedVal::new_null();
            for analysed in self.parts.iter() {
                folded = analysed.call(env.clone())?;
            }
            Ok(folded)
        }
    }

    Closure::managed(Analyzed {
        parts: analyses.collect(),
    })
}

Thanks for your guidance and your time

jedel1043 commented 1 year ago

(Could open a new issue for this, but I think it's too related to the same underlying problem of inspecting captured variables).

Unfortunately, the workaround proposed falls apart for async blocks and async functions, because every await point captures the variables used after the await.

On another note, is there any RFC related to inspecting the captured variables of closures/futures?