Open andykelvinbrooks opened 4 months ago
@andykelvinbrooks I tried this out and this is the IQueryable that OData generates after applying the $top and $skip:
DbSet<Order>()
.OrderBy($it => $it.OrderId)
.Select($it => new SelectAllAndExpand<Order>{
Model = TypedLinqParameterContainer<IEdmModel>.TypedProperty,
Instance = $it,
UseInstanceForProperties = True,
Container = new NamedProperty<IEnumerable<SelectAll<OrderItem>>>{
Name = "OrderItems",
Value = $it.OrderItems
.Select($it => new SelectAll<OrderItem>{
Model = TypedLinqParameterContainer<IEdmModel>.TypedProperty,
Instance = $it,
UseInstanceForProperties = True
}
)
}
}
)
.Skip(TypedLinqParameterContainer<int>.TypedProperty)
.Take(TypedLinqParameterContainer<int>.TypedProperty)
And this is the query that EfCore generates:
DECLARE @__TypedProperty_2 int = 1;
DECLARE @__TypedProperty_3 int = 2;
SELECT [t].[OrderId], [t].[CustomerName], [o0].[OrderItemId], [o0].[OrderId], [o0].[ProductName], [o0].[Quantity]
FROM (
SELECT [o].[OrderId], [o].[CustomerName]
FROM [Orders] AS [o]
ORDER BY [o].[OrderId]
OFFSET @__TypedProperty_2 ROWS FETCH NEXT @__TypedProperty_3 ROWS ONLY
) AS [t]
LEFT JOIN [OrderItems] AS [o0] ON [t].[OrderId] = [o0].[OrderId]
ORDER BY [t].[OrderId]
This shows that pagination is applied at the database level. I'm not sure whether the reason you're getting this is because you're using an in-memory database.
Also url GET /odata/Orders?$expand=OrderItems&$top=10&$skip=5
this should apply $top and $skip to the Orders
entity.
You could try server side paging using EnableQuery(PageSize=2)
or something like that on your controller action methods.
Attached sample project which contains local DB with limited sample records.
When we get the data from http://localhost:5209/invoices?$expand=invoicelines
Hardcoded pagesize as 50 in example
Generated query is :-
Microsoft.EntityFrameworkCore.Database.Command: Information: Executed DbCommand (1ms) [Parameters=[@TypedProperty_3='51', @__TypedProperty_1='51'], CommandType='Text', CommandTimeout='600'] SELECT [t].[InvoiceId] ,[t].[CustomerName] ,[t].[InvoiceDate] ,[t].[TotalAmount] ,[t0].[InvoiceLineId] ,[t0].[InvoiceId] ,[t0].[ProductName] ,[t0].[Quantity] ,[t0].[UnitPrice] FROM ( SELECT TOP (@TypedProperty_3) [v].[InvoiceId] ,[v].[CustomerName] ,[v].[InvoiceDate] ,[v].[TotalAmount] FROM [VIEW_Core_Invoices_V4_MS] AS [v] ORDER BY [v].[InvoiceId] ) AS [t] LEFT JOIN ( SELECT [t1].[InvoiceLineId] ,[t1].[InvoiceId] ,[t1].[ProductName] ,[t1].[Quantity] ,[t1].[UnitPrice] FROM ( SELECT [v0].[InvoiceLineId] ,[v0].[InvoiceId] ,[v0].[ProductName] ,[v0].[Quantity] ,[v0].[UnitPrice] ,ROW_NUMBER() OVER ( PARTITION BY [v0].[InvoiceId] ORDER BY [v0].[InvoiceLineId] ) AS [row] FROM [VIEW_InvoiceLineItem_2_MS] AS [v0] ) AS [t1] WHERE [t1].[row] <= @__TypedProperty_1 ) AS [t0] ON [t].[InvoiceId] = [t0].[InvoiceId] ORDER BY [t].[InvoiceId] ,[t0].[InvoiceId] ,[t0].[InvoiceLineId]
Generated query which I marked as bold(Inner Query) is fetching all the records from child(InvoiceLines) . We have a huge performance in real time as there are millions of records InvoiceLines in real time.
@ElizabethOkerio Kindly let me know your thoughts
@Dhananjaya-Reddy Thanks for the repro. I'll take a look and get back to you.
Hi @ElizabethOkerio , Any updates on this?
@Dhananjaya-Reddy I looked at your repro and the shared query above. From the shared query above, it's not true that the inner query fetches all records from child (InvoiceLines). If you look at the query, this part
WHERE [t1].[row] <= @__TypedProperty_1
ensures that only 51 rows of the InvoiceLines
are retrieved. It is important to note that the PageSize
that you set on the EnableQuery
- [EnableQuery(PageSize = 50)]
is applied to top-level elements and all their nested elements. This means that the response is expected to return a page of 51 invoices and each invoice will have 51 InvoiceLines
. To test if this is how this works, you can try with a smaller Pagesize
and see how this works. If the Pagesize
is smaller than the elements in the data store then you'll get a NextLink
that you can use to retrieve the other elements. It is also clear that pagination happens at the server side.
The only issue I'm seeing is with the calculation of row numbers. This part to be exact:
ROW_NUMBER() OVER (
PARTITION BY [v0].[InvoiceId] ORDER BY [v0].[InvoiceLineId]
) AS [row]
FROM [VIEW_InvoiceLineItem_2_MS] AS [v0]
) AS [t1]
If this is being done for millions of rows even if not fetching all of them can be computationally intensive and might affect performance. I think proper indexing on the columns used in the PARTITION BY
and ORDER BY
clauses can significantly improve the performance of the ROW_NUMBER
function.
Hi @ElizabethOkerio , Thank you for your thorough analysis of the query and the repro I shared. Your insights are valuable, and I appreciate the time you've taken to examine the issue in detail. I'd like to address a few points and share some additional thoughts:
Your observations about the query limitation and PageSize behavior are spot-on. This helps clarify how the pagination is intended to work.
This is the core issue I'm worried about. As you pointed out, even though we're not fetching all records, the ROW_NUMBER() function is still processing all rows in the view before any filtering occurs. For a view with millions of rows, this could be a significant performance bottleneck.
The most critical optimization we need to implement is applying filters before the ROW_NUMBER() calculation. This would significantly reduce the number of rows processed, especially for large datasets. Is there a way to restructure the query or modify the OData implementation that don't require major changes to the overall OData implementation to achieve this?
Thank you again for your insightful analysis, and I look forward to your thoughts on this critical optimization.
@ElizabethOkerio any update on fixing this issue?
- ROW_NUMBER() Calculation - Main Concern ... For a view with millions of rows, this could be a significant performance bottleneck.
This just occurred in our system: we have a parent table with tens of millions of rows joined to a child "detail" table with also tens of millions of rows. The typical ratio is 1:5, so if we select TOP(100) then I would expect a total of 100+500 rows to be processed by SQL Server. Instead, the ROW_NUMBER() filter added to the child table is forcing the database engine to sort the entire table, processing ~50 million rows to return just 500!
This feature really needs to a flag to toggle it on or off, either globally or on a per-navigation basis.
Summary: When using the $expand query option in Microsoft.AspNetCore.OData, all data for the expanded entities is retrieved first, and then pagination is applied in memory. This leads to performance inefficiencies, especially with large datasets.
Steps to Reproduce:
Expected Behavior: Pagination should be applied at the database level when retrieving expanded entities to minimize data retrieval and processing overhead. Actual Behavior: All data for the expanded entities is retrieved first, and then pagination is applied in memory, leading to potential performance issues and high memory usage.
Additional Context: This issue becomes more significant with larger datasets where retrieving all data for expanded entities before applying pagination results in slow response times and high memory consumption. Optimizing this behavior to apply pagination at the database level would greatly enhance performance and efficiency.
Suggested Improvement: Investigate and implement a way to apply pagination ($top and $skip) to expanded entities directly in the database query, ensuring that only the required subset of data is retrieved.