Make Functions Send + Sync

lucperkins commented 4 months ago

This makes implementations of Function usable in multi-threaded contexts.

I'm happy to put this behind a feature flag if desired.

clarkmcc commented 4 months ago

Hm, what's an example of something that doesn't work without this PR? In this example, is the goal to share a Context between two threads?

lucperkins commented 4 months ago

@clarkmcc Yes. Specifically, in my case I want to use a Context with Axum, and if you want to bind something to your server as state, it needs to implement Send and Sync. I think it's a pretty straightforward use case.

clarkmcc commented 4 months ago

Why would you want to bind Context to Axum state? Context is not intended to be thread-safe. There's really two concepts in this CEL implementation:

Program - This is the CEL program that has been parsed and is ready for execution. It is thread-safe, allowing you to use the same program over and over and across the thread boundary. It doesn't require mutable shared access because nothing in a program is mutable.
Context - This is where functions and variables are stored. It is not thread-safe because the CEL spec doesn't define the behavior of changing a variable used by a program midway through executing that program.

In the context of an Axum server, let's say that you were implementing some sort of auth validation or something with CEL. The program could be shared across Axum handlers (this would be the CEL logic for validating), but on each Axum request, you'd create a new Context and add variables like the request path, the client IP, etc which could then be reference in the thread-safe program.

This is sort of the mental model that I have for the difference between a Program (thread-safe) and a Context (not thread-safe).

Can you elaborate more on how your use-case differs from the design here?

lucperkins commented 4 months ago

Basically I want to be able to make authorization decisions for HTTP calls. Policy conditions, expressed as CEL, are supplied by admins and stored somewhere (database, in-memory, etc.). When users want to do something, a Program is created on the fly and evaluated using a Context. That Context would consist of a corpus of built-in functions and variables but would be enriched with specific values for specific evaluations. So let's say admin Big Boss has created the CEL policy username != "bad-actor-420" for some action. Then, one of the admin's employees wants to perform that action. The server determines that the employee works for Big Boss and fetches Big Boss's CEL condition. The Context is enriched with a value for that user's username and then the Big Boss's CEL policy is evaluated, with the result of the evaluation varying based on the user (so good-guy-steve is allowed to perform the action but bad-actor-420 is not). So in this case, the Program varies and, preferably, the Context stays mostly the same.

Now, I'm not wedded to this model and if you think there's a better way to do things given those goals, I'm all ears. The only alternative that really comes to mind is building a brand new context for every request, which I would prefer not to do.

clarkmcc commented 4 months ago

@lucperkins thanks for the explanation of the use-case. It makes perfect sense. In addition to making the functions Send + Sync, we'd also need to make the Context itself Send and Sync, correct? What would you propose there? Wrapping everything in a Mutex?

You know what could be interesting... the context can either be a root context, or a child context (see blog post about it) where the child context just has a reference to it's parent which could be another child context, or it could be the root context. This is how we support variable shadowing in CEL macros like .map. When a variable is not found in a child context, we traverse upwards until we get to the root.

We could look at implementing copy-on-write semantics in order to make Context Sync, where you can initialize your functions, your global variables that never change, etc. But then as soon as you add a variable, under the hood you get back a new child context that references the original parent context, allow you to share all the functions and globals, but be able to get a new context with just the additional variables you need for the request.

The only other question is whether we'd run into lifetime issues where the compiler won't be able to prove that the root context outlives the child contexts passed to the Axum handlers.

Anyways, that's sort of a brain-dump. Thoughts?

lucperkins commented 4 months ago

@clarkmcc First off, thanks a ton in turn for entertaining my suggestion and thinking it through so thoroughly. @cole-h has a maybe-solution to this where he has a struct that wraps Context:

struct PolicyDecider(Context<'static>);

impl PolicyDecider {
    fn new() -> Self { Self(Context::default()) }

    fn evaluate(&self, condition: &str) -> Result<Value, ExecutionError> {
        let mut inner_context = self.0.new_inner_scope();
        inner_context.add_variable_from_value("someVar", Value::from("someVal"));
        let program = Program::compile(condition)?;
        program.execute(inner_context)
    }
}

So basically using the mechanism that you're currently using for map and others. Seems to be working fine in testing and at least seems to avoid any potential mutability-related issues. Does this seem reasonable to you? Anything stand out? If it does seem reasonable, I'd be happy to add some docs around this because I suspect others will have similar goals.

clarkmcc commented 4 months ago

@lucperkins yes, this is exactly what I had in mind.

If this meets your needs, I think I'd also like to clean up the API a bit so that it's more obvious what functions to call to do exactly this. I'm also trying to think through whether we can just make this implicit behind the scenes -- i.e. context is always immutable and when you add a variable, it creates a child context on the spot and gives it back. I would need to figure out the best way to do that with support for batching variables into the context, and I would also need to figure out if doing this implicitly presents lifetime challenges for the user.

lucperkins commented 4 months ago

@clarkmcc I'd say this indeed meets our needs for now. More than happy to help out re: API design, so please don't hesitate to ping us. I'm closing this specific PR for now as I definitely agree about Send + Sync. I continue to be impressed by this project!

lucperkins commented 4 months ago

@clarkmcc I take it back. I conferred with Cole again, and actually we want to re-propose this specific change. Even when carefully handling mutability as in the pattern I laid out above and using Arc<PolicyDecider>, not having Function be Send + Sync breaks a lot of stuff in Axum (like error handling). We'll continue to explore other solutions here but for now, this would be a helpful unblocker for us.

clarkmcc commented 4 months ago

Would this PR be sufficient though? Context is still not Sync even with these changes I believe.

clarkmcc commented 4 months ago

And if I could throw out something else for consideration: there's probably a good argument for registering functions with a Program not with a Context. The functions that are required are going to be coupled with the Program that you're running, and there's not a good reason that I can think of that you'd want to take the same program and execute it multiple times with different functions in the context.

If we were to move functions to the Program, and then you initialized a new Context for each Axum request, putting into it only the required variables, would that meet your needs, or are there still benefits to sharing large portions of the Context outside of functions?

lucperkins commented 4 months ago

@clarkmcc In our case, it's okay for Context to not be Sync.

The issue with your suggestion, I think, is that we don't know what the Program will be until the request arrives. We ascertain who the user is and what they want to do, then we figure out which condition (CEL string) applies to them, then we update the context with values required for the evaluation, and then we evaluate. So we basically have a big chonky initial Context that we enrich with a few variables on demand and the Program is just a simple string that we pull from a database.

clarkmcc commented 4 months ago

@lucperkins no big deal, but if you have a spare moment, it would be great if you could add a minimum reproducible example of your use-case to the examples directory. Since we're pre 1.0, I'd like to capture as many use-cases as possible so that any breaking changes going forward still support the legitimate use-cases that exist in the wild.

If you don't have the time, not a big deal at all.

lucperkins commented 4 months ago

@clarkmcc More than happy to!

clarkmcc / cel-rust

Make Functions Send + Sync #55