OData / AspNetCoreOData

ASP.NET Core OData: A server library built upon ODataLib and ASP.NET Core
Other
457 stars 158 forks source link

AspNetCoreOData generates duplicate JOIN statement if expanding an IQueryable that already has a join #497

Open habbes opened 2 years ago

habbes commented 2 years ago

This issue was reported by an internal customer.

They have separate classes to represent DB entities and API-visible entities:

We are experiencing major performance regression in the Expand query when migrating our code to latest ASP.Net core with latest OData 8.0.6.

I’ve minimized my code to create a simple repro:

Defined 2 entities, Incident and Alert:

public class Incident
    {
        [Key]
        public long Id { get; set; }

        public IEnumerable<Alert> Alerts { get; set; }
    }

public class Alert
    {
        [Key]
        public string Id { get; set; }
    }

Defined their DB entities:

[Table("RawIncidents")]
    public class IncidentDbEntity
    {
        [Key]
        [Column]
        public long IncidentId { get; set; }

        public List<AlertDbEntity> Alerts { get; set; }
    }

[Table("AlertsForPublicApi")]
    public class AlertDbEntity
    {
        [Key]
        [Column(TypeName = "varchar")]
        public string AlertId { get; set; }

        [Column]
        [ForeignKey(nameof(Incident))]
        public long? IncidentId { get; set; }

        public IncidentDbEntity Incident { get; set; }
    }

Defined a query translation from DB entities to Public facing entities:

public IQueryable<Incident> GetIncidents()
        {
            var graphIncidents = m_dbContext.RawIncidents.Select(incident => new Incident()
            {
                Id = incident.IncidentId,
                Alerts = incident.Alerts.Select(a => new Alert()
                {
                    Id = a.AlertId
                })
            });

            return graphIncidents;
        }

And we return this IQueriable in the controller with [EnableQuery] attribute, to let OData do the magic and add the query options to the SQL query.

The problem is when we add $expand=alerts to the query - it adds additional left join that is not working (more than a minute of query time) on large SQL tables (more than ~1M rows)

This is the generated query, and this is the additional query for the $expand=alerts

SELECT [t].[IncidentId], [t0].[Id], [t1].[AlertId]
FROM (
    SELECT TOP(@__TypedProperty_3) [r].[IncidentId]
    FROM [WcdTestCloudEusStg].[RawIncidents] AS [r]
) AS [t]
LEFT JOIN (
    SELECT [a].[AlertId] AS [Id], [a].[IncidentId]
    FROM [WcdTestCloudEusStg].[AlertsForPublicApi] AS [a]
) AS [t0] ON [t].[IncidentId] = [t0].[IncidentId]
LEFT JOIN (
    SELECT [t2].[AlertId], [t2].[IncidentId]
    FROM (
        SELECT [a0].[AlertId], [a0].[IncidentId], ROW_NUMBER() OVER(PARTITION BY [a0].[IncidentId] ORDER BY [a0].[AlertId]) AS [row]
        FROM [WcdTestCloudEusStg].[AlertsForPublicApi] AS [a0]
    ) AS [t2]
    WHERE [t2].[row] <= @__TypedProperty_1
) AS [t1] ON [t].[IncidentId] = [t1].[IncidentId]
ORDER BY [t].[IncidentId], [t0].[Id], [t1].[IncidentId], [t1].[AlertId]

I guess that it is doing it to enforce the page size we defined on each entity. But why 2 Join queries? And why order the rows in the 2nd query and not just simple TOP as in the Incident level?

habbes commented 2 years ago

Hi @KenitoInc I tried to reproduce the issue and this is the expression that's generated after SelectExpandBinder transforms the query:

DbSet<IncidentDbEntity>()
    .Select(incident => new Incident{ 
        Id = incident.Id, 
        Alerts = incident.Alerts
            .Select(a => new Alert{ 
                Id = a.Id, 
                Description = a.Descr 
            }
            ) 
    }
    )
    .Select($it => new SelectAllAndExpand<Incident>{ 
        Model = TypedLinqParameterContainer<IEdmModel>.TypedProperty, 
        Instance = $it, 
        UseInstanceForProperties = True, 
        Container = new NamedProperty<IEnumerable<SelectAll<Alert>>>{ 
            Name = "Alerts", 
            Value = $it.Alerts
                .Select($it => new SelectAll<Alert>{ 
                    Model = TypedLinqParameterContainer<IEdmModel>.TypedProperty, 
                    Instance = $it, 
                    UseInstanceForProperties = True 
                }
                ) 
        }

    }
    )

The initial nested Select statement is the one returned by the GetIncidents() method (my demo uses slightly differently-named class names). SelectExpandBinder is applying a select statement that "expands" the Alerts property on top of the IQueryable which already contains a Select statement on the same property. In my view, this expected behaviour as far the OData query binder is concerned. OData takes the IQueryable as a black box and applies transformations on top of it, I don't know if it should be the responsibility of OData to inspect the IQueryable to see if there's any redundancy.

What do you think we can do here @KenitoInc @xuzhg ?

habbes commented 2 years ago

Let me add some additional context. When I return IQueryable without applying OData expands, the query executed is:

SELECT [i].[Id], [a].[Id], [a].[Descr]
      FROM [Incidents] AS [i]
      LEFT JOIN [Alerts] AS [a] ON [i].[Id] = [a].[IncidentId]
      ORDER BY [i].[Id], [a].[Id]

The left join is still there even there's no $expand option. So from the outset it seems like the initial setup of the query results in unnecessary data.

This is the query after $expand is applied by OData:

SELECT [i].[Id], [a].[Id], [a].[Descr], [a0].[Id], [a0].[Descr]
      FROM [Incidents] AS [i]
      LEFT JOIN [Alerts] AS [a] ON [i].[Id] = [a].[IncidentId]
      LEFT JOIN [Alerts] AS [a0] ON [i].[Id] = [a0].[IncidentId]
      ORDER BY [i].[Id], [a].[Id], [a0].[Id]

At this point I assumed that if you added a .Select() statement on top of the return queryable of GetIncidents() you would still get the duplicate join. But I was wrong, here's how the query expression looks like in debug view:

DbSet<IncidentDbEntity>()
    .Select(incident => new Incident{ 
        Id = incident.Id, 
        Alerts = incident.Alerts
            .Select(a => new Alert{ 
                Id = a.Id, 
                Description = a.Descr 
            }
            ) 
    }
    )
    .Select(i => new Incident{ 
        Id = i.Id, 
        Alerts = i.Alerts
            .Select(a => new Alert{ 
                Id = a.Id, 
                Description = a.Description 
            }
            ) 
    }
    )

Note the extra Select() call. But the resulting query still only has one join, as EF Core figured to ignore the redundant Select:

SELECT [i].[Id], [a].[Id], [a].[Descr]
      FROM [Incidents] AS [i]
      LEFT JOIN [Alerts] AS [a] ON [i].[Id] = [a].[IncidentId]
      ORDER BY [i].[Id], [a].[Id]

Maybe we can modify the generated expression on OData side such that extra join will be ignored.

julealgon commented 2 years ago

Question:

This is the query after $expand is applied by OData:

Does this happen if you perform an .Include from EF itself as well? How does EF handle it?

JohnYoungers commented 2 years ago

I work on a project that's similar in nature: we don't expose the EFCore classes directly, and instead the APIs expose their own models which have mapping expressions to the EFCore classes.

We didn't initially realize it, but we ran into the same two issues:

  1. If no $select or $expand is provided, the SQL generated will be the entire projection (including navigation properties).
  2. If only an $expand is provided, it will result in duplicated joins.

Our quick attempt at resolving this was to create a default $select that could be applied if one was not provided: so in this example, if the request was /incidents or /incidents?$expand=alerts, it would get translated into {original}&$select=id. Doing so results in correct SQL queries being generated for both issues.

The problem with this approach is it does not work if it's a type with dynamic properties: those will get filtered out. I decided to revisit the problem this weekend (which is how I stumbled upon this issue), and I can resolve issue 1 by applying another select expression generated from the EntityType details, which filters out the navigation properties:

var properties = entityType.DeclaredProperties.Where(p => p.PropertyKind is not EdmPropertyKind.Navigation)
    .Select(p => model.GetAnnotationValue<ClrPropertyInfoAnnotation>(p).ClrPropertyInfo)
    .ToList();

if (entityType.IsOpen)
{
    properties.Add(model.GetDynamicPropertyDictionary(entityType));
}

var entityParam = Expression.Parameter(clrType, "");
var newItemExpression = Expression.MemberInit(
    Expression.New(clrType),
    properties.Select(f => Expression.Bind(f, Expression.Property(entityParam, f))));

var lambdaFuncType = (typeof(Func<,>)).MakeGenericType(clrType, clrType);
var mapper = Expression.Lambda(lambdaFuncType, newItemExpression, entityParam);

However, I'm not sure on the best way of handling issue 2 (just $expand provided, producing duplicate joins).

I get the feeling I'm approaching this the wrong way, duct taping over the problem, opposed to addressing it earlier in the pipeline (query provider? I'm not sure how EFCore natively is handling this), so I'll be interested in finding out if any potential solutions arise from this issue.

JohnYoungers commented 2 years ago

If nothing else, this type of solution may be promising:

Extending off of my prior note (creating an expression T => new T() that excludes navigation properties), I think a similar concept can be used to solve the $expand scenario as well.

Once OData's done it's thing, we'll update the IQueryable like so:

result.Value = odata.SelectExpandClause is null ? ApplyDefaultSelect(queryable) : ApplyExpressionRewrite(queryable);

In ApplyExpressionRewrite, we'll create a new queryable:

original.Provider.CreateQuery(new NavigationExclusionExpressionVisitor<T>(GenerateNewFacadeExpression).Visit(original.Expression)!);

Where the Expression Visitor is specifically looking for a binding that looks like Instance = [my type]:

protected override MemberBinding VisitMemberBinding(MemberBinding node) =>
    node is MemberAssignment { Member: PropertyInfo pi } ma
    && typeof(ISelectExpandWrapper).IsAssignableFrom(pi.DeclaringType)
    && pi.Name == "Instance"
    && pi.PropertyType == typeof(T)
        ? Expression.Bind(node.Member, _memberInitFactory(ma.Expression))
        : base.VisitMemberBinding(node);

This will rewrite an expression like this:

.Select($it => new SelectAllAndExpand`1() {
    Model = value(Microsoft.AspNetCore.OData.Query.Container.LinqParameterContainer+TypedLinqParameterContainer`1[Microsoft.OData.Edm.IEdmModel]).TypedProperty, 
    Instance = $it, 
    UseInstanceForProperties = True, 

Into:

.Select($it => new SelectAllAndExpand`1() {
    Model = value(Microsoft.AspNetCore.OData.Query.Container.LinqParameterContainer+TypedLinqParameterContainer`1[Microsoft.OData.Edm.IEdmModel]).TypedProperty, 
    Instance = new Incident() {Id = $it.Id}, 
    UseInstanceForProperties = True, 

Which should result in the join no longer being duplicated, since the collection is no longer referenced in Instance = $it

habbes commented 2 years ago

@julealgon I did try to add an Include, i.e. db.Incidents.Include("Alerts") (not sure if that's what you mean), it did not behave any differently. Still the same queries were generated.

@jayoungers thanks for the insights. I will experiment with your proposals and get back. I didn't know having both $select and $expand results in the correct query.

julealgon commented 2 years ago

@julealgon I did try to add an Include, i.e. db.Incidents.Include("Alerts") (not sure if that's what you mean), it did not behave any differently. Still the same queries were generated.

I meant to ask what happens if you completely omit OData from the equation and do the query only in EF. It does look like this is OData-specific though from other comments here.

JohnYoungers commented 2 years ago

@julealgon - it's definitely an issue once OData is applied: the double join occurs because the child table is referenced in the base level Instance = $it as well as in Container = new NamedProperty<IEnumerable<SelectAll<Alert>>>

@habbes - I believe I have everything working as I want it at this point; however, I had to create a QueryProvider to make this happen: I'm manually using [EnableQuery], which will return a Queryable that can be adjusted for standard querying endpoints, but for those that return SingleResult it will have already executed the queryable.

In my use case I'm intercepting the result prior to EnableQueryAttribute running, replacing the Queryable with my wrapper QueryProvider, and then adjusting the expression once it gets executed. One thing to note by using this approach is you'll need to manually set HandleNullPropagation to false in order for the joins to work correctly: that field is set to false automatically based on the EFCore Provider's namespace; by using your own that functionality will stop working

maumar commented 1 year ago

additional info from EF perspective:

when OData is out of the picture (i.e. using Include operation in EF), there will be no duplication:

ctx.Incidents.Include(x => x.Alerts).Select(incident => new Incident()
        {
            Id = incident.IncidentId,
            Alerts = incident.Alerts.Select(a => new Alert()
            {
                Id = a.AlertId
            })
        }).ToList();

results in:

SELECT [r].[IncidentId], [a].[AlertId]
FROM [RawIncidents] AS [r]
LEFT JOIN [AlertsForPublicApi] AS [a] ON [r].[IncidentId] = [a].[IncidentId]
ORDER BY [r].[IncidentId]

In fact, EF completely removes the include statement during query processing, because the root entity (Incidents) is not part of the final projection - so there is nothing to perform include on.

The way OData performs expand is not by injecting Include statement to the query, but rather composing a custom projection (SelectAllAndExpand) on top of the original query.

@JohnYoungers is correct that the reason for duplication is that SelectAllAndExpand projects the original structure ($it) which contains the collection of Alerts and on top of that creates another collection of Alerts - the product of expand operation. Effectively the expand query is something like this:

ctx.Incidents.Select(incident => new
        {
            Id = incident.IncidentId,
            Alerts = incident.Alerts.Select(a => new Alert
            {
                Id = a.AlertId
            }),
            Container = incident.Alerts.Select(a => new Alert
            {
                Id = a.AlertId
            })
        }).ToList();

which also produces sql with 2 joins:

SELECT [r].[IncidentId], [a].[AlertId], [a0].[AlertId]
FROM [RawIncidents] AS [r]
LEFT JOIN [AlertsForPublicApi] AS [a] ON [r].[IncidentId] = [a].[IncidentId]
LEFT JOIN [AlertsForPublicApi] AS [a0] ON [r].[IncidentId] = [a0].[IncidentId]
ORDER BY [r].[IncidentId], [a].[AlertId]

this is by design from EF side - we don't perform any deduplication on collection navigations that are present in the final projection.