dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.63k stars 3.15k forks source link

Cosmos: rework expressions around scalar/structural types #33999

Open roji opened 2 months ago

roji commented 2 months ago

The same expression (syntax-wise) in Cosmos can return both scalars and structural types. For example x.Foo can return a scalar (1) or a structural type (JSON object representing an entity type). This is different from relational, where generally an expression type either returns a scalar (e.g. ColumnExpression), or a structural type (e.g. TableExpression, which represents a set of structural types). An exception to this in relational is probably JSON column access.

Our general SQL expression tree design mirrors this: SqlExpression represents scalars (has a TypeMapping), non-SqlExpressions represent structural types. In Cosmos (but also in some places in relational) things are different: the same expression can typically return both a scalar and a structural type. For example, a JSON property access (x.Foo) can return a scalar or structural type.

As a result, after #33998 the Cosmos query pipeline has an explosion of explosion of expression types which represent the same syntax, but return different things. For example, ScalarAccessExpression represents x.Foo where Foo is a scalar, ObjectAccessExpression represents the same where Foo is a structural type, and ObjectArrayAccessExpression represents the same where Foo is an array of structural types.

This is a bad state of affairs; I considered unifying by e.g. having a dummy type mapping for structural types (allowing SqlExpression to represent structural types as well), but the expression split goes into shaper generation as well. So for now I continued along the current path of duplicating expression types. A more modern shaper generation architecture (and I think more aligned to relational) wouldn't require this separation at the expression level, but rather recognizes structural types via StructuralTypeShaperExpression; I went a bit in this direction but more work is needed.

Once our shaper no longer looks at the server/syntax expressions to determine structural type information (but uses StructuralTypeShaperExpression instead), we should be able to remove all structural type/navigation information from those syntax expressions, and unify them. This would make a much clearer separation between server (query) and client (shaper).

roji commented 2 months ago

Note: for 9.0, consider at least unifying all expression pairs which the shaper doesn't care about. We could have a CosmosExpression which can have either a SqlExpression (when it represents a scalar) or an ITypeBase (when it represents a structural type - though actually having the type isn't really needed at the moment).