Open Caellian opened 2 weeks ago
This sounds like a similar solution to the problem proposed here https://github.com/clarkmcc/cel-rust/issues/43 and would possible be solved by the proposed trait-based value design here https://github.com/clarkmcc/cel-rust/issues/76. Does that sound right or are the additional pros/cons to this approach over the approaches discussed in those issues?
Lazy
variant would allow for every level of github.orgs['clarkmcc'].projects['cel-rust'].license
to be lazy. Another difference is that cel-rust wouldn't allow values to change during expression evaluation. Resolver approach allows someone to use random
which means that hello == hello
could evaluate to false.@chirino, what's your oppinion on this?
Same linear time evaluation remark as you've noted in #47 applies here. But it's not something I'd adhere to at all costs because it's not a good metric. In this context, constant time means evaluating expression always takes 1000ms because of non-negotiable upfront cost, while variable complexity would be 20ms-1000ms based on which fields are being read; if an HTTP request has to happen, it's still got to happen, be it handled by registered handler or code that uses cel-rust
.
I've been wondering whether the laziness should really be applied at interpreter time. Obviously that's the only way to avoid populating unnecessary state, but in CEL that's relatively rare. So, wouldn't it be more sensible to used the parsed expression to resolve all Ident
s being referenced and their members being navigated and populate a Map
instead? member.syntax
lets you navigate a Map
just as it does for a String
, no differentiation there, is there? So it'd be transparent to the expression author...
If that assumption holds, and that's how I will implement it for now in any case, that let one populate the evaluation's context with the minimal set of state required, e.g. github.orgs['clarkmcc'].projects['cel-rust'].license
, populates a Map
bound to github
, containing an entry keyed orgs
with a Map
containing ... Not sure I'd resolve the key lookups in that example, but request.method
wouldn't force me to populate the whole request
(with headers
, ...) but only the method, made accessible thru the request: Map<String, String>
in this case.
@alexsnaps are you proposing that there just be an additional step where we clone & convert types into CEL values prior to running the interpreter? This is a crude, non-working example but just want to make sure I understand what you're suggesting.
let mut context = Context::default();
let values = HashMap::from([("github", ...)])
let program = Program::compile("github.orgs['clarkmcc'].projects['cel-rust'].license == true");
program.resolve_or_something(&mut context, values).unwrap(); // <-
let result = program.execute(&context);
Yeah, something from the crate could help, but I'm (literally right now started) doing it outside of it... Something analogous to Expression::_references
, where it'd find all "member path", so that something along side ['GET', 'POST'].contains(context.request.request)
, would resolve in an entry in the Vec
of paths: ['context', 'request', 'method']
. From there I now create a Context
with the a root binding, resolving the Ident
context
(ctx.add_variable_from_value
), of type Map
that contains String
key request
resolving the Attribute
, that itself has a value of type Value::Map
that contains method
. I resolve what this all means to my domain's model and inject the Value::String
in the Map
. Now the members are operating on a Map
instead of a lazy populated Struct
, but I think this wouldn't make a difference to the author of the expression.
Still WIP, but here is a passing test for what this would look like... or at least the first part of it, i.e. finding all the paths. I'm working on populating that Value::Map
now with only the members required by the expression. Hope this clarifies. Dunno if that'd suffice for other use-cases tho... would love to hear what this does not cover.
member.syntax lets you navigate a Map just as it does for a String, no differentiation there, is there?
Depends how you look at it. Strictly speaking no, because string has no members. But after parsing, if you're not using a literal, you'll have identifier
which needs to be resolved.
Obviously that's the only way to avoid populating unnecessary state, but in CEL that's relatively rare.
I think the cost of 1 extra enum variant (maybe a larger jump/cache miss?), isn't really that severe for how much it simplifies the code I'm working on.
So, wouldn't it be more sensible to used the parsed expression to resolve all
Ident
s being referenced and their members being navigated and populate aMap
instead?let one populate the evaluation's context with the minimal set of state required, e.g. github.orgs['clarkmcc'].projects['cel-rust'].license, populates a Map bound to github, containing an entry keyed orgs with a Map containing ...
What benefits does it offer over having lazy values? You still do the same thing, only now with 5 extra steps, 1 extra place where you need to transcribe data structs into code, 1 additional dependency, and more redundant compute/work.
Each place you'd try running a CEL Program
, would need to:
Context
,Context
which is constructed from the statement,Context
,I personally don't want to do that, it's just a lazy value, why would I need to handle parsing and expression traversal just to run author.role()
and store the result in a Map
. I chose CEL because I don't want to write an expression parser and interpreter, not because I want to manually handle lexing of tokenized strings.
Ideally, you'd construct Context
once before you evaluate any CEL expressions, and re-use (mostly) the same Context
for everything, optionally adding root level values where it makes sense.
It's also unsafe to assume identifier
s have any meaning without Context
, assuming they do before constructing Context
will lead to issues because you now have 2 places that have to take care of identifier
semantics. It offers no additional benefit.
member.syntax lets you navigate a Map just as it does for a String, no differentiation there, is there?
Depends how you look at it. Strictly speaking no, because string has no members. But after parsing, if you're not using a literal, you'll have identifier which needs to be resolved.
Haha! Obviously, my bad, I meant Struct
not String
🤦
Obviously that's the only way to avoid populating unnecessary state, but in CEL that's relatively rare.
I think the cost of 1 extra enum variant (maybe a larger jump/cache miss?), isn't really that severe for how much it simplifies the code I'm working on.
Are you talking about the Value
enum here? Having this Value::Lazy
? What would that variant'sValueType
be?
Each place you'd try running a CEL Program, would need to:
Can you elaborate here what the use case is? I.e. what's changing the most, the CEL expressions themselves? Or the Context
on which they operate? I'm making the assumption the latter, but now I'm hesitating based of what you're saying. Also, is your "application" (whatever evaluates the expression), aware of the semantics of the Context
or are these two decoupled?
I chose CEL because I don't want to write an expression parser and interpreter, not because I want to manually handle lexing of tokenized strings.
Did you look at the "reference" implementation, how do you think that'd work there?
I'm wondering how for instance this could be make to work when injecting protobuf data in the context? Which relates to a few of the things that have been discussed here wrt Serde
et al...
Anyways, a bit like how Serde
deals with Arc
fields on structs, I'm thinking dealing with OnceCell
might be a thing. i.e. linking a rust Struct
to the context, where OnceCell
would init
ed ... unsure how tho, given the API 🤔
Are you talking about the
Value
enum here? Having thisValue::Lazy
? What would that variant'sValueType
be?
Returned from internally stored variable, that's inferred from result type of the provided function:
pub struct LazyValue {
eval: Box<dyn ErasedProducer>,
value_type: ValueType,
current: Option<Value>,
}
impl LazyValue {
pub fn new<R>(ctor: impl Into<Producer<R>>);
pub fn value_type(&self) -> ValueType;
pub fn reset(&mut self);
}
// Incomplete type projection magic:
type Producer<R> = Box<dyn Fn(&Value) -> R>;
trait ErasedProducer;
impl ErasedProducer for Fn(&Value) -> impl ResultType;
trait ResultType; // projects `R` into `ValueType` stored in `ErasedProducer`
impl ResultType for Into<Value>;
Calling ctx.add_lazy_variable("name", impl Into<Producer<R>>)
directly constructs Value::Lazy(LazyValue)
.
Did you look at the "reference" implementation, how do you think that'd work there?
I assume it would use something similar to above, and then use the reflect
module to determine ValueType
from function. In some ways a Go implementation would be simpler than Rust one because reflect
exists so compile time type projection isn't necessary.
I'm wondering how for instance this could be make to work when injecting protobuf data in the context?
Value::Lazy
is internal detail. Only the library and Rust code is aware of it. Protobuf data would be deserialized into other (existing) types and not Lazy
because lazy is intended only for manually assigned variables.
I'm not familiar with protobuf though, not sure if it has a similar concept, but if it does, then it's not possible to construct Value::Lazy
only if the return type of a lazy function can't be inferred.
If I have map with millions of keys, and I want to avoid converting all those keys to Cel values, would this lazy idea help? Or do you still end up inserting millions of lazy entries into the cel context?
Did you look at the "reference" implementation, how do you think that'd work there?
I assume it would use something similar to above, and then use the
reflect
module to determineValueType
from function. In some ways a Go implementation would be simpler than Rust one becausereflect
exists so compile time type projection isn't necessary.
My question was unclear here, I was wondering how (if at all) one would address that with the golang implementation without changing it. The golang version is much more complete than the Rust one here...
chirino: If I have map with millions of keys, and I want to avoid converting all those keys to Cel values, would this lazy idea help? Or do you still end up inserting millions of lazy entries into the cel context?
For the most part no. It's only useful for cases like the one you show in #43 where you'd use a jump table for lookup in resolver. If the data is fully dynamic (e.g. value of each field is field_name + "_field"
and any field identifier is valid), then this can't represent it. If the entry cound is finite, and most entries are larger than the size of the LazyValue
then it would help, but it's still not the best solution.
This mostly addresses high evaluation cost members might carry, not as much Map length/memory limitations.
I'd personally approach dynamic members via a trait (trait from #76, example impl in #96) - it's not dissimilar to #58, the difference is it uses a trait that can be inserted at any level of the tree structure and the parent of identifier
is type-checked.
If both concepts were implemented, I see myself using both.
alexsnaps: My question was unclear here, I was wondering how (if at all) one would address that with the golang implementation without changing it. The golang version is much more complete than the Rust one here...
That's not a question then, but a statement: "I don't think this is a good suggestion because it deviates from reference implementation."
I agree, it does. It also doesn't interfere with existing functionality/spec. Spec expects all data present in CEL to be precomputed (in order to serialize it with protobuf), which isn't ideal in some use cases. The example you provided includes an additional lexing pass just to collect that data in advance. If fetching expensive data is common enough, it makes more sense to provide an ergonomic way to do it easily instead of expecting users to handle constructing dynamic data structures based on used identifiers in the expression (because that will be slower 90% of the time).
My requirements differ enough that I'm using a fork of this library, but I still see no reason why this is a bad idea. And "reference implementation doesn't have it" isn't a good reason IMO (see Weston). It can be controlled with a feature in case you need no-alloc or have a kink for unsafely parsing all your CEL expressions to check for requirements.
alexsnaps: My question was unclear here, I was wondering how (if at all) one would address that with the golang implementation without changing it. The golang version is much more complete than the Rust one here...
That's not a question then, but a statement: "I don't think this is a good suggestion because it deviates from reference implementation."
It is a question. I really don't know if that is feasible in a way or another. I was not arguing that it shouldn't be done because the reference implementation doesn't do it. I'd argue that out of the box, it'd be desirable for this crate to behave as the golang implementation does, if only to provide portability of CEL expressions (and their behavior, including in terms of data accesses) across the two implementations. But as you rightfully mentioned, things can be extended in non "compliant" ways (through a feature flag, which has been discussed even in the golang implementation already) or made possible through the introductions of certain hooks and/or SPIs.
So from your answer tho, I guess you do know this isn't feasible with the golang implementation neither, is that right?
My requirements differ enough
This is where it's hard to understand where you are coming from, given how little you share about the problem and rather solely focus on your solution. Let me showcase: in my case, ~another lexing pass~ traversing the AST is required to gather the "property accesses", but once. We have CEL expressions that "barely change" (at least compared to how many times they'd get evaluated, i.e. millions). So that additional traversal of the AST is a relative small price to pay, as we store the result and are ready to populate the data for when evaluation time comes. Further more, planning for that retrieval would actually be faster than lazy loading it, as we can now implement a reasonable fetch plan for the data and get it in bulk. All that not to disqualify your use case, very much the opposite as understanding the choice for the tradeoffs might help and do realize we aren't all solving the same problems, but to highlight how the actual use case matters.
It is a question. I really don't know if that is feasible in a way or another. I was not arguing that it shouldn't be done because the reference implementation doesn't do it.
Sorry, it seemed like a loaded question.
So from your answer tho, I guess you do know this isn't feasible with the golang implementation neither, is that right?
I'm not the best person to ask (as I've said before) as I've never touched protobufs and have very limited experience with golang. It's not currently supported, but it is feasible.
This is where it's hard to understand where you are coming from, given how little you share about the problem and rather solely focus on your solution.
Oh, pfff, me? Well, uhm... 👉👈
I'm, of course, working on a ✨ground breaking✨ labeling action for GitHub. Randomly stumbled across this crate to parse expressions like:
- if: commit.body.has_word("Wayland") && !commit.body.has_word("X11")
then:
- add_label("wayland")
- add_to_milestone("Wayland")
- if: comment.matches("$assign: @(\w+)") && user.role == "owner"
then:
- assign(matches[1])
And now I'm pushing for changes on this project - don't you love FOSS?! /j
In my case I have a single file that's used by multiple contexts - if Program
fails because identifier
isn't defined, the rule is skipped. I could do something similar to what you're doing and sort different rule entries based on context they require, but catching ExecutionError::UndeclaredReference
is much simpler for now. Yes, it could cause unnecessary requests as-is, the forked version will check root-level identifiers first and maybe even allow grouping that way, but that's out of scope for now, I just needed #96 and created issues for what I felt was missing.
So I don't use protobuf, I simply need something that will evaluate expressions and don't feel like writing everything from scratch.
I could likely do batching as well if I switched to GitHub GraphQL API, but that feels like more trouble than it's worth atm. Anyway, user.role
is not provided in initial workflow JSON, and could cause 2 additional requests to be sent. Maybe in some cases it won't even be used so I don't want to gather data upfront and send 20 requests to build the Context
. I have a single Context
construction function, and then add root level variables to it based on event that triggered the action (comment, PR, issue create, edit, etc.). user
is effectively an ID wrapper.
Thanks for sharing what the context of your problem!
In the meantime, I've looked up the golang implementation a little further, and I wonder if we're not missing blocks that would actually let us decouple the process of hydrating the whole context upon creation from the crate. Still investigating this and might even start some exploratory work, but I think here are the blocks we miss:
DynType
, for data with an unknown layout before evaluationVal
)But wondering for now, if adding a Value::Struct
that would sit as a "proxy" between the interpreter and the actual data, so that accessing a field of that Value::Struct
wouldn't require the whole struct to be hydrated, but would only resolve the accessed member, just as for Value::Map
. Tho, per spec those struct
s are to be converted to Map
s and with today's of Value::Map
which holds a ... HashMap<Key, Value>
, well it would need to be fully populated.
A bit of a rambling, but seeing how the reference implementation does it, i.e. having an open set of Value
s... this would be a much bigger undertaking. Maybe have a DynValue
?
- Some abstract value representation. [...], adding a Value::Struct that would sit as a "proxy" between the interpreter and the actual data, so that accessing a field of that Value::Struct wouldn't require the whole struct to be hydrated, but would only resolve the accessed member, just as for Value::Map. Tho, per spec those structs are to be converted to Maps and with today's of Value::Map which holds a ... HashMap<Key, Value>, well it would need to be fully populated.
DynamicValue == "String" && DynamicValue > 5
.DynType, for data with an unknown layout before evaluation.
This sounds more or less like what I want:
pub struct DynValue {
eval: Box<Fn(&Context) -> Value>,
current: RefCell<Option<Value>>,
}
impl DynValue {
pub fn new<F>(ctor: F) where F: Fn(&Context) -> Value;
// simplified - without explicit RefCell borrowing
pub fn value_type(&self, ctx: &Context) -> ValueType {
if let Some(current) = self.current {
current.value_type()
} else {
let value = (self.eval)(ctx);
self.current = Some(value);
value.value_type()
}
}
#[cfg(feature="non_spec_compliant")]
pub fn reset(&mut self);
}
"unknown layout before evaluation" is basically what I'm pushing for (I think). Layout is determined by type, so something without a layout can conceptually be:
*const ()
that's erased but stores a discriminant in the first few bytes, which is unsafe unless constructing it takes a reference with lifetime, which requires Value<'context
> and that's a lot of work because magic.rs
has to be modified to support it,Value
? Like Value(Value)
, but I don't see how that would be useful,Other than that, unsized types ([T]
) can't be stored, so I think it's more practical to consider the options that are dynamic when observed from runtime. Having the Value
be determined by a function seems like better design to me as it's then uniquely owned by evaluation engine and can't be Drop
ped while in use or cause other weirdness.
I'm repurposing this issue for DynValue
then.
Howdy everyone, sorry I was on vacation last week and am just catching up on this. Let me summarize and see if I'm following.
There are two general proposals:
Am I understanding correctly where the discussion has landed?
I need to go brush up on DynValue in the Go implementation and get a better handle on how that works, and what it means for #80.
I think option 2 can absolutely be made compliant with the spec, as this is how the golang impl does gets it's input ref.Val
:
Activation
is another mechanism, and mapActivation
literally talks about lazily supplying the values at eval time:
It would be a nice improvement if lazy values were supported (like
std::cell::LazyCell
).These variables would evaluate their value only when used, at which point they'd turn into some other
Value::Variant
.Example where this would be useful is having variables bind to an external API and evaluating all of them if they're not used is expensive.
There's ways of achieving this functionality already (with functions), but it would be nice if users could refer to them with
member.syntax
.