Open habbes opened 2 years ago
Hi @KenitoInc I tried to reproduce the issue and this is the expression that's generated after SelectExpandBinder transforms the query:
DbSet<IncidentDbEntity>()
.Select(incident => new Incident{
Id = incident.Id,
Alerts = incident.Alerts
.Select(a => new Alert{
Id = a.Id,
Description = a.Descr
}
)
}
)
.Select($it => new SelectAllAndExpand<Incident>{
Model = TypedLinqParameterContainer<IEdmModel>.TypedProperty,
Instance = $it,
UseInstanceForProperties = True,
Container = new NamedProperty<IEnumerable<SelectAll<Alert>>>{
Name = "Alerts",
Value = $it.Alerts
.Select($it => new SelectAll<Alert>{
Model = TypedLinqParameterContainer<IEdmModel>.TypedProperty,
Instance = $it,
UseInstanceForProperties = True
}
)
}
}
)
The initial nested Select statement is the one returned by the GetIncidents()
method (my demo uses slightly differently-named class names). SelectExpandBinder is applying a select statement that "expands" the Alerts
property on top of the IQueryable
which already contains a Select statement on the same property. In my view, this expected behaviour as far the OData query binder is concerned. OData takes the IQueryable
as a black box and applies transformations on top of it, I don't know if it should be the responsibility of OData to inspect the IQueryable
to see if there's any redundancy.
What do you think we can do here @KenitoInc @xuzhg ?
Let me add some additional context.
When I return IQueryable
without applying OData expands, the query executed is:
SELECT [i].[Id], [a].[Id], [a].[Descr]
FROM [Incidents] AS [i]
LEFT JOIN [Alerts] AS [a] ON [i].[Id] = [a].[IncidentId]
ORDER BY [i].[Id], [a].[Id]
The left join is still there even there's no $expand
option. So from the outset it seems like the initial setup of the query results in unnecessary data.
This is the query after $expand
is applied by OData:
SELECT [i].[Id], [a].[Id], [a].[Descr], [a0].[Id], [a0].[Descr]
FROM [Incidents] AS [i]
LEFT JOIN [Alerts] AS [a] ON [i].[Id] = [a].[IncidentId]
LEFT JOIN [Alerts] AS [a0] ON [i].[Id] = [a0].[IncidentId]
ORDER BY [i].[Id], [a].[Id], [a0].[Id]
At this point I assumed that if you added a .Select()
statement on top of the return queryable of GetIncidents()
you would still get the duplicate join. But I was wrong, here's how the query expression looks like in debug view:
DbSet<IncidentDbEntity>()
.Select(incident => new Incident{
Id = incident.Id,
Alerts = incident.Alerts
.Select(a => new Alert{
Id = a.Id,
Description = a.Descr
}
)
}
)
.Select(i => new Incident{
Id = i.Id,
Alerts = i.Alerts
.Select(a => new Alert{
Id = a.Id,
Description = a.Description
}
)
}
)
Note the extra Select()
call. But the resulting query still only has one join, as EF Core figured to ignore the redundant Select:
SELECT [i].[Id], [a].[Id], [a].[Descr]
FROM [Incidents] AS [i]
LEFT JOIN [Alerts] AS [a] ON [i].[Id] = [a].[IncidentId]
ORDER BY [i].[Id], [a].[Id]
Maybe we can modify the generated expression on OData side such that extra join will be ignored.
Question:
This is the query after $expand is applied by OData:
Does this happen if you perform an .Include
from EF itself as well? How does EF handle it?
I work on a project that's similar in nature: we don't expose the EFCore classes directly, and instead the APIs expose their own models which have mapping expressions to the EFCore classes.
We didn't initially realize it, but we ran into the same two issues:
$select
or $expand
is provided, the SQL generated will be the entire projection (including navigation properties). $expand
is provided, it will result in duplicated joins.Our quick attempt at resolving this was to create a default $select
that could be applied if one was not provided: so in this example, if the request was /incidents
or /incidents?$expand=alerts
, it would get translated into {original}&$select=id
. Doing so results in correct SQL queries being generated for both issues.
The problem with this approach is it does not work if it's a type with dynamic properties: those will get filtered out. I decided to revisit the problem this weekend (which is how I stumbled upon this issue), and I can resolve issue 1 by applying another select expression generated from the EntityType details, which filters out the navigation properties:
var properties = entityType.DeclaredProperties.Where(p => p.PropertyKind is not EdmPropertyKind.Navigation)
.Select(p => model.GetAnnotationValue<ClrPropertyInfoAnnotation>(p).ClrPropertyInfo)
.ToList();
if (entityType.IsOpen)
{
properties.Add(model.GetDynamicPropertyDictionary(entityType));
}
var entityParam = Expression.Parameter(clrType, "");
var newItemExpression = Expression.MemberInit(
Expression.New(clrType),
properties.Select(f => Expression.Bind(f, Expression.Property(entityParam, f))));
var lambdaFuncType = (typeof(Func<,>)).MakeGenericType(clrType, clrType);
var mapper = Expression.Lambda(lambdaFuncType, newItemExpression, entityParam);
However, I'm not sure on the best way of handling issue 2 (just $expand
provided, producing duplicate joins).
I get the feeling I'm approaching this the wrong way, duct taping over the problem, opposed to addressing it earlier in the pipeline (query provider? I'm not sure how EFCore natively is handling this), so I'll be interested in finding out if any potential solutions arise from this issue.
If nothing else, this type of solution may be promising:
Extending off of my prior note (creating an expression T => new T()
that excludes navigation properties), I think a similar concept can be used to solve the $expand
scenario as well.
Once OData's done it's thing, we'll update the IQueryable
like so:
result.Value = odata.SelectExpandClause is null ? ApplyDefaultSelect(queryable) : ApplyExpressionRewrite(queryable);
In ApplyExpressionRewrite
, we'll create a new queryable:
original.Provider.CreateQuery(new NavigationExclusionExpressionVisitor<T>(GenerateNewFacadeExpression).Visit(original.Expression)!);
Where the Expression Visitor is specifically looking for a binding that looks like Instance = [my type]
:
protected override MemberBinding VisitMemberBinding(MemberBinding node) =>
node is MemberAssignment { Member: PropertyInfo pi } ma
&& typeof(ISelectExpandWrapper).IsAssignableFrom(pi.DeclaringType)
&& pi.Name == "Instance"
&& pi.PropertyType == typeof(T)
? Expression.Bind(node.Member, _memberInitFactory(ma.Expression))
: base.VisitMemberBinding(node);
This will rewrite an expression like this:
.Select($it => new SelectAllAndExpand`1() {
Model = value(Microsoft.AspNetCore.OData.Query.Container.LinqParameterContainer+TypedLinqParameterContainer`1[Microsoft.OData.Edm.IEdmModel]).TypedProperty,
Instance = $it,
UseInstanceForProperties = True,
Into:
.Select($it => new SelectAllAndExpand`1() {
Model = value(Microsoft.AspNetCore.OData.Query.Container.LinqParameterContainer+TypedLinqParameterContainer`1[Microsoft.OData.Edm.IEdmModel]).TypedProperty,
Instance = new Incident() {Id = $it.Id},
UseInstanceForProperties = True,
Which should result in the join no longer being duplicated, since the collection is no longer referenced in Instance = $it
@julealgon I did try to add an Include
, i.e. db.Incidents.Include("Alerts")
(not sure if that's what you mean), it did not behave any differently. Still the same queries were generated.
@jayoungers thanks for the insights. I will experiment with your proposals and get back. I didn't know having both $select and $expand results in the correct query.
@julealgon I did try to add an
Include
, i.e.db.Incidents.Include("Alerts")
(not sure if that's what you mean), it did not behave any differently. Still the same queries were generated.
I meant to ask what happens if you completely omit OData from the equation and do the query only in EF. It does look like this is OData-specific though from other comments here.
@julealgon - it's definitely an issue once OData is applied: the double join occurs because the child table is referenced in the base level Instance = $it
as well as in Container = new NamedProperty<IEnumerable<SelectAll<Alert>>>
@habbes - I believe I have everything working as I want it at this point; however, I had to create a QueryProvider
to make this happen: I'm manually using [EnableQuery]
, which will return a Queryable
that can be adjusted for standard querying endpoints, but for those that return SingleResult
it will have already executed the queryable.
In my use case I'm intercepting the result prior to EnableQueryAttribute
running, replacing the Queryable with my wrapper QueryProvider, and then adjusting the expression once it gets executed. One thing to note by using this approach is you'll need to manually set HandleNullPropagation
to false in order for the joins to work correctly: that field is set to false automatically based on the EFCore Provider's namespace; by using your own that functionality will stop working
additional info from EF perspective:
when OData is out of the picture (i.e. using Include
operation in EF), there will be no duplication:
ctx.Incidents.Include(x => x.Alerts).Select(incident => new Incident()
{
Id = incident.IncidentId,
Alerts = incident.Alerts.Select(a => new Alert()
{
Id = a.AlertId
})
}).ToList();
results in:
SELECT [r].[IncidentId], [a].[AlertId]
FROM [RawIncidents] AS [r]
LEFT JOIN [AlertsForPublicApi] AS [a] ON [r].[IncidentId] = [a].[IncidentId]
ORDER BY [r].[IncidentId]
In fact, EF completely removes the include statement during query processing, because the root entity (Incidents) is not part of the final projection - so there is nothing to perform include on.
The way OData performs expand is not by injecting Include statement to the query, but rather composing a custom projection (SelectAllAndExpand) on top of the original query.
@JohnYoungers is correct that the reason for duplication is that SelectAllAndExpand projects the original structure ($it) which contains the collection of Alerts and on top of that creates another collection of Alerts - the product of expand operation. Effectively the expand query is something like this:
ctx.Incidents.Select(incident => new
{
Id = incident.IncidentId,
Alerts = incident.Alerts.Select(a => new Alert
{
Id = a.AlertId
}),
Container = incident.Alerts.Select(a => new Alert
{
Id = a.AlertId
})
}).ToList();
which also produces sql with 2 joins:
SELECT [r].[IncidentId], [a].[AlertId], [a0].[AlertId]
FROM [RawIncidents] AS [r]
LEFT JOIN [AlertsForPublicApi] AS [a] ON [r].[IncidentId] = [a].[IncidentId]
LEFT JOIN [AlertsForPublicApi] AS [a0] ON [r].[IncidentId] = [a0].[IncidentId]
ORDER BY [r].[IncidentId], [a].[AlertId]
this is by design from EF side - we don't perform any deduplication on collection navigations that are present in the final projection.
This issue was reported by an internal customer.
They have separate classes to represent DB entities and API-visible entities:
We are experiencing major performance regression in the Expand query when migrating our code to latest ASP.Net core with latest OData 8.0.6.
I’ve minimized my code to create a simple repro:
Defined 2 entities, Incident and Alert:
Defined their DB entities:
Defined a query translation from DB entities to Public facing entities:
And we return this IQueriable in the controller with [EnableQuery] attribute, to let OData do the magic and add the query options to the SQL query.
The problem is when we add $expand=alerts to the query - it adds additional left join that is not working (more than a minute of query time) on large SQL tables (more than ~1M rows)
This is the generated query, and this is the additional query for the $expand=alerts
I guess that it is doing it to enforce the page size we defined on each entity. But why 2 Join queries? And why order the rows in the 2nd query and not just simple TOP as in the Incident level?