dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.79k stars 3.19k forks source link

Distinct string.Join or string.Concat causing SQL Syntax error #29845

Open 2767mr opened 1 year ago

2767mr commented 1 year ago

While trying to work around #29200 using groupBy we discovered that the combination of .Distinct() and string.Join or string.Concat causes a SQL Syntax error. This issue seems similar to #29638.

Example Query

Full project: https://github.com/2767mr/EFAggregateExample

db.Posts
    .GroupBy(p => p.Blog)
    .Select(g => new
    {
        Url = g.Key.Url,
        PostTitles = string.Join(", ", g.Select(p => p.Title).Distinct())
    })

Generated SQL:

SELECT [b].[Url], COALESCE(STRING_AGG(DISTINCT ([p].[Title]), N', '), N'') AS [PostTitles]
      FROM [Posts] AS [p]
      INNER JOIN [Blogs] AS [b] ON [p].[BlogId] = [b].[BlogId]
      GROUP BY [b].[BlogId], [b].[Url]

Stacktrace:

``` An exception occurred while iterating over the results of a query for context type 'BloggingContext'. Microsoft.Data.SqlClient.SqlException (0x80131904): Incorrect syntax near ','. at Microsoft.Data.SqlClient.SqlCommand.<>c.b__208_0(Task`1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of stack trace from previous location --- at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync() ClientConnectionId:04983856-a507-4efe-a0ef-59bed7eff375 Error Number:102,State:1,Class:15 Microsoft.Data.SqlClient.SqlException (0x80131904): Incorrect syntax near ','. at Microsoft.Data.SqlClient.SqlCommand.<>c.b__208_0(Task`1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of stack trace from previous location --- at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync() ClientConnectionId:04983856-a507-4efe-a0ef-59bed7eff375 Error Number:102,State:1,Class:15 ```

Provider and version information

EF Core version: 7.0.1 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 6.0 Operating system: Windows 11 IDE: Visual Studio 2022

roji commented 1 year ago

Unfortunately, SQL Server's STRING_AGG doesn't seem to support DISTINCT (most other aggregate functions do):

DROP TABLE IF EXISTS [Blogs];
CREATE TABLE [Blogs] (
    [Id] int NOT NULL IDENTITY,
    [Category] int,
    [Name] nvarchar(max)
);
INSERT INTO Blogs (Category, Name) VALUES (1, 'foo'), (1, 'bar');

SELECT STRING_AGG(DISTINCT (b.Name), N', ')
FROM Blogs AS b
GROUP BY b.Category;

We should ideally fail translation rather than generate invalid SQL, but otherwise there's not much for us to do here...

2767mr commented 1 year ago

It is possible to do this with OUTER APPLY though I am not sure if that's a good idea:

DROP TABLE IF EXISTS [Blogs];
CREATE TABLE [Blogs] (
    [Id] int NOT NULL IDENTITY,
    [Category] int,
    [Name] nvarchar(max)
);
INSERT INTO Blogs (Category, Name) VALUES (1, 'foo'), (1, 'bar'), (2, 'baz');

SELECT STRING_AGG(tmp.Name, N', ')
FROM Blogs AS b
OUTER APPLY (
    SELECT DISTINCT b2.Name AS Name 
    FROM Blogs b2 
    WHERE b2.Id = b.Id
) tmp
GROUP BY b.Category;
roji commented 1 year ago

@2767mr yeah, that's definitely possible - but we generally don't do such complicated things in method translations.

Note that you can achieve a very similar thing via the following:

_ = ctx.Blogs
    .Select(b => new
    {
        b.Url,
        PostTitles = string.Join(", ", b.Posts.Select(p => p.Title).Distinct())
    })
    .ToList();

This starts with the Blogs instead of the Posts, thus avoiding the need for a GroupBy altogether. The SQL it generates indeed uses an OUTER APPLY:

SELECT [b].[Url], [b].[Id], [t].[Title]
FROM [Blogs] AS [b]
OUTER APPLY (
    SELECT DISTINCT [p].[Title]
    FROM [Posts] AS [p]
    WHERE [b].[Id] = [p].[BlogId]
) AS [t]
ORDER BY [b].[Id]

Importantly, EF doesn't translate to STRING_AGG in this context (aggregate function without groupby, #29200), so this is client-evaluated instead: the titles are pulled back from the database and joined via .NET string.Join. This isn't a problem in this particularly query (note that no extra data is being transferred as a result of the client eval), but does mean that you can't further compose LINQ operators on top of the query for execution in the database.

ajcvickers commented 1 year ago

Note from triage: we should throw a better exception here.