Closed tpaulus closed 2 weeks ago
There might be a way to avoid the expensive step in building a CoreSchema
, but, either way, we should make this operation cheaper
Copying comments from slack:
I think the cost is coming from the step where we have to invert the action hierarchy: https://github.com/cedar-policy/cedar/blob/8ab5daeb89fcf80ee2aadf20b685aff7163d2b71/cedar-policy-validator/src/schema.rs#L818-L831 The only reason we need to store the action hierarchy in that orientation (edges going from parent->child rather than child->parent) is to support this query efficiently when typechecking in expressions in policy validation here https://github.com/cedar-policy/cedar/blob/8ab5daeb89fcf80ee2aadf20b685aff7163d2b71/cedar-policy-validator/src/schema.rs#L784-L800 so, the idea is to rework how we validate in so that rather than needing to query for what actions are descendants of a particular action, it instead asks for what actions are ancestors of a particular action.
Though, that's an invasive change to the typechecking code, and it still leaves some less expensive but still not free code for constructing the action entities in the CoreSchema constructor, so I wouldn't object to lifting some of to happen once in the schema constructor. If you want to take a stab at that route, I'd recommend trying to just lift the construction of the actions hash map. The rest of the CoreSchema constructor is just wrapping a reference to the ValidatorSchema object, so it would become basically free. That would also let us optimize the action_entities function at the same time.
It's better with the changes in #1290, but still not as performant as I'd like...
From 38ms to 7ms for our schema and small entity graph.
It's surprising to see a large block in that chart for clone
on entity uids, that should be a fairly cheap operation
It looks like a bulk of the effort is now spent as a result of this line: https://github.com/cedar-policy/cedar/blob/main/cedar-policy-core/src/entities.rs#L216
However, I can't figure out a clean way to change the entity map to be either a Map of EIDs and Entity Borrows, or Entity ARCs, since a number of downstreams (like the TC Calculator) expect owned references of the entities.
Some more context from discussions: performance issues is likely due to needing to clone a large (>1000) set of action entities where these actions are in many action groups.
https://github.com/cedar-policy/cedar/pull/1296 reduces the need of Arc::unwrap_or_clone()
.
Describe the improvement you'd like to request
When Entities are loaded (either from string, or from an iterable of Entities), a new
CoreSchema
, which profiling has revealed to be an expensive operation for a large schema.Describe alternatives you've considered
It appears that the Schema is immutable once constructed, and as such, may be possible to have the
CoreSchema
instantiated upon creation of theSchema
to avoid incurring theCoreSchema
instantiation cost each time entities are loaded or have a schema applied to them.Additional context
Profiling Code:
Is this something that you'd be interested in working on?