dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.61k stars 3.14k forks source link

Optimize SQL expression comparison by caching hash codes #34149

Open ranma42 opened 1 month ago

ranma42 commented 1 month ago

In the query pipeline sometimes expressions are compared for (deep) equality. This currently is based on a recursive visit of the subtree, which can be very costly if done multiple times while visiting the expression (it is quite easy to construct expression trees that require O(n^2) operations when visited).

This could be improved by computing a hash that filters out most of the inequalities as suggested in https://github.com/dotnet/efcore/pull/34133#discussion_r1663964197

ranma42 commented 1 month ago

Side note: another case that could be interesting is the negation of an expression. Having it "ready to use" would make some predicate-related optimizations faster/cheaper.

roji commented 1 month ago

Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (thanks @ranma42), we can't cache the hashcode.

roji commented 1 month ago

Blocked on making our entire SqlExpression tree immutable (https://github.com/dotnet/efcore/issues/32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (https://github.com/dotnet/efcore/pull/34133#discussion_r1664259539), we can't cache the hashcode.

Though on second thought, SelectExpression should never be mutable when it's already composed upon...

roji commented 6 days ago

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

roji commented 6 days ago

Note also #19859, which is about not exhaustively calculating hash codes for the entire tree, but just some shallower subset; this would further improve our performance here.

ranma42 commented 6 days ago

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

If that is the case, I believe I can try and implement this. What would be the best way to evaluate the performance? (would a micro-benchmark make sense?)

roji commented 6 days ago

Sure, an ad-hoc BenchmarkDotNet benchmark could work - not sure we need to commit it etc. Though this is one of the cases where the benefits depend on the tree depth/complexity which you choose to benchmark, which is a bit arbitrary... I think it's OK to do this improvement in any case?