dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
849 stars 285 forks source link

The MARS TDS header contained errors using ASP.NET Core and EF Core connecting to Azure SQL Server #85

Closed AfsanehR-zz closed 3 years ago

AfsanehR-zz commented 6 years ago

Copied from here:

I have an ASP.NET Core app [Microsoft.AspNetCore.App 2.1.4] with EF Core 2.1.4 [DbContext pooling enabled] and data stored on an Azure SQL database.

Occasionally [once in 1-2 days] I get unrecoverable error stating System.Data.SqlClient.SqlException (0x80131904): The incoming tabular data stream (TDS) protocol stream is incorrect. The MARS TDS header contained errors.

Once I restart the app, everything resumes working until it happens again.

Probably import to note is that I don't have MultipleActiveResultSets enabled in my connection string at all, which makes the issue even more strange.

Has anyone else noticed something similar? Are there any ways I can trace the problem?

Stack trace:

System.Data.SqlClient.SqlException (0x80131904): The incoming tabular data stream (TDS) protocol stream is incorrect. The MARS TDS header contained errors.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
   at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.Execute(IRelationalConnection connection, DbCommandMethod executeMethod, IReadOnlyDictionary`2 parameterValues)
   at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.ExecuteReader(IRelationalConnection connection, IReadOnlyDictionary`2 parameterValues)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.Enumerator.BufferlessMoveNext(DbContext _, Boolean buffer)
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementation[TState,TResult](Func`3 operation, Func`3 verifySucceeded, TState state)
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.Enumerator.MoveNext()
   at Microsoft.EntityFrameworkCore.Query.Internal.LinqOperatorProvider.ExceptionInterceptor`1.EnumeratorExceptionInterceptor.MoveNext()
   at System.Collections.Generic.List`1.AddEnumerable(IEnumerable`1 enumerable)
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Inviton.Web.Portal.Controllers.EventListController.GetEvents(BL bl, Int32 venueId, Int32 promoterId) 
brunolau commented 6 years ago

Regarding the stackoverflow issue addiotional questions asked by Afsaneh:

Unfortunately can't post the csproj file contents...

brunolau commented 6 years ago

Here's the Dump of the SqlError object:

Class: 16

Errors: System.Data.SqlClient.SqlError: The incoming tabular data stream (TDS) protocol stream is incorrect. The MARS TDS header contained errors.

Error code: -2146232060

Line number: 1

Number: 4011

SQL Procedure: None

Source: Core .Net SqlClient Data Provider

State: 3

stap123 commented 5 years ago

I had a random spate of these errors appear in my ASP.NET application today. I use ASP.NET Core 2.1, EF Core 2.1 and Azure SQL Database. They occurred in a small time window and once I stopped and started the web app the errors seemed to stop.

I think I queried successfully from SSMS while the errors were happening but couldn't say for sure that it was at the same time. If it happens again I'll query in SSMS to confirm that I can.

Is there anything I can provide that will help with a solution to this? (I have application insights and database logging turned on for my application)

NOTES: MARS is explicitly set to off in my connection string (I use the one Azure Provides via the portal UI)

AfsanehR-zz commented 5 years ago

@stap123 @brunolau Would you be able to provide us a minimal repro sample application? Thanks.

stap123 commented 5 years ago

@AfsanehR I wouldn't even know where to start, I've so far only seen it once today and restarting my application made it stop happening. I will keep my eyes open and if it starts happening in a pattern of some kind I'll try and put one together but at the moment it seems random so I've got no starting point for a repro.

I realise that's useless to you as well though haha

I have got about 130x exceptions logged in my App Insights at the moment but I doubt the stack trace would be enough for you?

It also looks like the error occurred on ALL queries no matter where they originated from so it doesn't appear to be a query specific issue.

brunolau commented 5 years ago

I confirm the same - errors occurring at random queries, random time. I've tried to stress-test the application by running 30 threads that was making DB queries in parallel, but that didn't reproduce the issue either.

I've tried to use the SSMS query while the error was occurring and it worked, I've also tried to deploy to another azure release slot at the time the issue was occurring and it worked.

What I've temporarily ended up with was writing a middleware that once detects this particular error [based on the err message], automatically restarts the web app. As the issue occurs 1-2 times a day, it's acceptable, but much better would be having the issue solved.

What connects both mine and Adam's issue is the ASP.NET Core 2.1 + EF Core 2.1 + Azure SQL DB combo, but I cannot even say what part of the chain to blame.

AfsanehR-zz commented 5 years ago

ok thanks @brunolau @stap123 . We will investigate this with the configurations you mentioned, although I think it would be highly unlikely we hit the same issue as we need to know what queries are running. Will get back to you on this. Could you also please give us the output of dotnet --info?

brunolau commented 5 years ago

Here you go, obtained from the Azure console

.NET Core SDK (reflecting any global.json): Version: 2.2.100 Commit: 51868761f2

Runtime Environment: OS Name: Windows OS Version: 10.0.14393 OS Platform: Windows RID: win10-x86 Base Path: D:\Program Files (x86)\dotnet\sdk\2.2.100\

Host (useful for support): Version: 2.2.0 Commit: 1249f08fed

.NET Core SDKs installed: 1.1.10 [D:\Program Files (x86)\dotnet\sdk] 2.1.403 [D:\Program Files (x86)\dotnet\sdk] 2.1.500 [D:\Program Files (x86)\dotnet\sdk] 2.2.100 [D:\Program Files (x86)\dotnet\sdk]

.NET Core runtimes installed: Microsoft.AspNetCore.All 2.1.3 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.5 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.6 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.2.0 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.1.3 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.5 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.6 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.2.0 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 1.0.12 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.1.9 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.0.9 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.3 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.5 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.6 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.2.0 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs: https://aka.ms/dotnet-download

stap123 commented 5 years ago

Here's mine 👍

.NET Core SDK (reflecting any global.json): Version: 2.2.100 Commit: 51868761f2

Runtime Environment: OS Name: Windows OS Version: 10.0.14393 OS Platform: Windows RID: win10-x86 Base Path: D:\Program Files (x86)\dotnet\sdk\2.2.100\

Host (useful for support): Version: 2.2.0 Commit: 1249f08fed

.NET Core SDKs installed: 1.1.10 [D:\Program Files (x86)\dotnet\sdk] 2.1.403 [D:\Program Files (x86)\dotnet\sdk] 2.1.500 [D:\Program Files (x86)\dotnet\sdk] 2.2.100 [D:\Program Files (x86)\dotnet\sdk]

.NET Core runtimes installed: Microsoft.AspNetCore.All 2.1.3 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.5 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.6 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.2.0 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.1.3 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.5 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.6 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.2.0 [D:\Program Files (x86)\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 1.0.12 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.1.9 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.0.9 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.3 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.5 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.6 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.2.0 [D:\Program Files (x86)\dotnet\shared\Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs: https://aka.ms/dotnet-download

hackfaq commented 5 years ago

Hello, we have experiensed same problem:

System.Data.SqlClient.SqlException (0x80131904): The incoming tabular data stream (TDS) protocol stream is incorrect. The MARS TDS header contained errors. at System.Data.SqlClient.SqlCommand.<>c.b__122_0(Task1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke() at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location where exception was thrown --- at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot) --- End of stack trace from previous location where exception was thrown --- at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in C:\projects\dapper\Dapper\SqlMapper.Async.cs:line 419

We use

How to reproduce: When we experienced problem: our database log file was full, and we got exception: the transaction log for database '' is full due to "LOG_BACKUP", after that exception all queries in .net core app was with errors.

When we increased log file, .net 4.7 apps starts to work, but .net core app continue to throw exception. Only restart of application helped

stap123 commented 5 years ago

I got this again this morning between 07:38:09 and 07:39:22.

Not sure if that's useful to help you track anything down or not. (I did nothing to any of my services to stop it happening but I then have successful requests being processed at 07:40:06)

brunolau commented 5 years ago

I managed to temporarily auto-mitigate the issue by creating custom Entity framework execution strategy that once encounters the MARS error, closes the connection, clears the pool and reopens. Since I've introduced this workaround, I've been running errorless for 3 days.

To sum it up. Once the error is encountered, closing & reopening the error connection + clearing the connection pool seems to re-establish working order

PolitovArtyom commented 5 years ago

We are getting the same problem. Errors starts occuring in same moment on different servers. Application pool restart helps,

Application Server Microsoft Windows Server 2012 R2 Standard 6.3.9600 Build 9600 dotnet host Version: 2.1.5 Commit: 290303f510 .NET Core runtimes installed: Microsoft.AspNetCore.All 2.1.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore Microsoft.AspNetCore.App 2.1.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore Microsoft.NETCore.App 2.1.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]

Database Microsoft SQL Server 2017 (RTM-CU11) (KB4462262) - 14.0.3038.14 (X64) Sep 14 2018 13:53:44 Copyright (C) 2017 Microsoft Corporation Enterprise Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 (Build 9600: )

Application Dapper 1.50.5 Microsoft.AspNetCore.App 2.1.5

sepehr1014 commented 5 years ago

We're having the same issue. Any solutions?

PolitovArtyom commented 5 years ago

@AfsanehR Do you need some additional data for investigation? I can try to get application and tcp dump while errors occuring

AfsanehR-zz commented 5 years ago

@PolitovArtyom Yes. That would be helpful. @sepehr1014 @PolitovArtyom do you have the MARS enabled in your connection? Also if possible a repro would be useful.

PolitovArtyom commented 5 years ago

MARS is not enabled on client and disabled on server

wboevink commented 5 years ago

We're also having the same issue. With NetCore, EFCore and an Azure SQL db

mshenoy83 commented 5 years ago

@brunolau could you maybe create a gist with the workaround.

brunolau commented 5 years ago

Sure, I will post it in a couple of days, I'm away from the PC for a couple of days and as soon as I get back, I'll do it

mshenoy83 commented 5 years ago

Just 1 more thing. I've been facing this error for a few days now. I have a durable function which calls my Sql Server VM. The version of Sql Server configured was Sql Server 2017 (Microsoft SQL Server 2017 (RTM-CU13) (KB4466404) - 14.0.3048.4 (X64) Nov 30 2018 12:57:58 Copyright (C) 2017 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2016 Datacenter 10.0 (Build 14393: ) (Hypervisor)

On a hunch I fired up a new VM, this time with Sql Server 2016. And so far my durable function seems to be working fine. I really throttled it because I was worried maybe the number of connections was causing the problem. So the function is running very slowly now but still it has no errors, so far.

brunolau commented 5 years ago

@mshenoy83 Sorry it took a bit longer, here goes my dirty hack. In order to use it, please check the "Custom execution strategy" section of this documentation https://docs.microsoft.com/en-us/ef/core/miscellaneous/connection-resiliency

class CustomExecutionStrategy:

using Brulasoft.Inviton.Data.Internal.SQL;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Storage;
using System;
using System.Data.SqlClient;

namespace Brulasoft.Inviton.Data.Internal
{
    /// <summary>
    /// Custom SQL execution
    /// </summary>
    internal class CustomExecutionStrategy : SqlServerRetryingExecutionStrategy
    {
        public CustomExecutionStrategy(ExecutionStrategyDependencies dependencies) : base(dependencies)
        {
        }

        protected override bool ShouldRetryOn(Exception exception)
        {
            if (base.ShouldRetryOn(exception))
            {
                return true;
            }

            if (exception is SqlException sqlException)
            {
                if (SqlRecoveryAttemptor.ShouldTryRecover(exception, ExceptionsEncountered.Count) && Dependencies != null && Dependencies.CurrentDbContext != null)
                {
                    var context = Dependencies.CurrentDbContext.Context;
                    if (context != null)
                    {
                        var sqlConn = context.Database.GetDbConnection() as SqlConnection;
                        if (sqlConn != null)
                        {
                            SqlRecoveryAttemptor.TryRecover(sqlConn, ExceptionsEncountered.Count);
                            return true;
                        }
                    }
                }
            }

            return false;
        }
    }
}

class SqlRecoveryAttemptor

using System;
using System.Collections.Generic;
using System.Data.Common;
using System.Data.SqlClient;
using System.Text;

namespace Brulasoft.Inviton.Data.Internal.SQL
{
    public class SqlRecoveryAttemptor
    {
        public const int RetryCount = 4;

        public static bool ShouldTryRecover(Exception ex, int retryAttemptsCount)
        {
            return retryAttemptsCount < RetryCount && IsMARSError(ex);
        }

        public static bool IsMARSError(Exception ex)
        {
            return ex.Message.Contains("MARS TDS header");
        }

        public static void TryRecover(SqlConnection sqlConn, int retryAttemptsCount)
        {
            if (retryAttemptsCount > 2)
            {
                System.Threading.Thread.Sleep(1500);
            }

            if (sqlConn.State != System.Data.ConnectionState.Closed)
            {
                TryClose(sqlConn);
                TryClearPool(sqlConn);
                TryOpen(sqlConn);
            }
            else
            {
                TryOpen(sqlConn);
                TryClose(sqlConn);
                TryClearPool(sqlConn);
                TryOpen(sqlConn);
                TryClose(sqlConn);
            }
        }

        private static void TryClearPool(SqlConnection conn)
        {
            try
            {
                SqlConnection.ClearPool(conn);
            }
            catch (Exception)
            {
            }
        }

        private static void TryClose(DbConnection conn)
        {
            try
            {
                conn.Close();
            }
            catch (Exception)
            {
            }
        }

        private static void TryOpen(DbConnection conn)
        {
            try
            {
                conn.Open();
            }
            catch (Exception)
            {
            }
        }
    }
}
rcvink commented 5 years ago

+1 Am experiencing the same problem, not sure how to fix it besides repeatedly restarting (based on comments above).

mscrivo commented 5 years ago

Same issue here since migrating to EF Core from EF6 in Dec 2018. We see this happen randomly on a single server (among many that are hitting the DB). We have gone 10-15 days without seeing it, then it just pops up and doesn't recover until the server is restarted.

Currently using EF Core 2.2.3 on Azure SQL DB

brunolau commented 5 years ago

I second to that - for us it started once we migrated from ASP.NET to ASP.NET Core

dkershner6 commented 5 years ago

Just to chime in, I just experienced this issue in Azure Functions v2 (.Net Core 2.1). I am using EFCore v2.2.3 to connect to Azure SQL. The call I am making to the DB that was spawning the error was as basic as it gets, essentially a SELECT * into a table with 2 rows and 8 columns.

Since my deployment is a little different from the above, maybe this will help: For this app, I have 3 different deployments that use the same shared library, that contains my EF Core Data Model.

Up until recently, this app lived on a Sql Server Web Deployment on a Ubuntu VM. I had never seen this particular error (but had another odd SSL error instead that exhibited similar behavior, with restart fixing).

I will also say that at that time I was using EF Core 2.1.4 as well, and only recently upgraded to 2.2.3.

Connection string is the same for all 3 deployments, the MARS section is: MultipleActiveResultSets=False;

It appears to have started during a period of high DTU use (about 40 min at 90%+). This is not an infrequent occurrence, however, and has actually happened far less over the past day.

Hopefully that helps.

divega commented 5 years ago

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

nurhat commented 5 years ago

We have same problem for the following run time env, currently solved the problem solution provided by @brunolau

Application Server Microsoft Windows Server 2016 .NET Core runtimes installed: Dotnet core runtime 2.1.5
Database Microsoft SQL Server 2017

Application Microsoft.EntityFrameworkCore 2.1.4 Microsoft.AspNetCore.App 2.1.5

divega commented 5 years ago

@AfsanehR, @David-Engel we are now hitting this exception with an equivalent stack trace with Microsoft.Data.SqlClient while running our EF Core tests: https://github.com/aspnet/EntityFrameworkCore/issues/16218.

It is hard to isolate a repro (e.g. my understanding is that it only occurs in a complete test run, but not if we run specific tests individually) but it is possible that what we have is enough to help make progress with the investigation.

Any chance that this can be prioritized for 1.0? It seems many customers have reported it already.

cc @smitpatel, @ajcvickers

David-Engel commented 5 years ago

@divega I agree that it would be good to figure this out for 1.0. We'll see if we can fit it in.

divega commented 5 years ago

@David-Engel thanks. Let us know if we can somehow help with the repro (even if it is not isolated).

avaneerd commented 5 years ago

We ran into this issue using Entity Framework Core 2.2.4. Also using Azure SQL.

Our code that produced this problem was roughly as followed:

var queryable = dbContext.Set<SomeEntity>()
  .Where(e => e.SomeBit = false);

if (command.Filter != null)
    queryable = queryable.Where(command.Filter);

var subQuery = queryable
  .GroupBy(e => e.ExternalReference, (key, elements) => new
  {
    ExternalReference = key,
    Created = elements.Max(e => e.Created)
  })
  .OrderByDescending(q => q.Created)
  .Skip(skip)
  .Take(query.Size);

queryable = dbContext.Set<SomeEntity>()
  .Join(subQuery,
    e => new { e.ExternalReference, e.Created },
    e => new { e.ExternalReference, e.Created },
    (e, _) => e)
  .OrderByDescending(q => q.Created);

var result = await queryable.ToListAsync();

The query that is produced by EF Core is:

SELECT [r].[Id], [r].[Created], [r].[Deleted], [r].[ExternalReference], [r].[SomeId]
FROM [SomeEntity] AS [r]
INNER JOIN (
    SELECT [r0].[ExternalReference], MAX([r0].[Created]) AS [Created]
    FROM [SomeEntity] AS [r0]
    WHERE ([r0].[SomeBit] = 0) AND ([r0].[SomeId] = @__someIdId_0)
    GROUP BY [r0].[ExternalReference]
    ORDER BY [Created]
    OFFSET @__p_1 ROWS FETCH NEXT @__p_2 ROWS ONLY
) AS [t] ON ([r].[ExternalReference] = [t].[ExternalReference]) AND ([r].[Created] = [t].[Created])

The idea here is that we take the newest item for every external reference and page these results.

This works without any problems UNTIL I hit the last page with incomplete page (fetch 20 get 19). Then I either get a query timeout or the "MARS TDS header contained errors" error.

The solution somewhat puzzled me, but it works...

Where we join our dbset with the subquery we get the DbSet again:

...
queryable = dbContext.Set<SomeEntity>()
  .Join(subQuery,
    e => new { e.ExternalReference, e.Created },
...

When we reuse the DbSet that is acquired on line 1 this works without any issues:

...
queryable = queryable
  .Join(subQuery,
    e => new { e.ExternalReference, e.Created },
...

In the end this is now our working code:

var baseQueryable = dbContext.Set<SomeEntity>();
var queryable = baseQueryable.Where(e => e.SomeBit = false);

if (command.Filter != null)
    queryable = queryable.Where(command.Filter);

var subQuery = queryable
  .GroupBy(e => e.ExternalReference, (key, elements) => new
  {
    ExternalReference = key,
    Created = elements.Max(e => e.Created)
  })
  .OrderByDescending(q => q.Created)
  .Skip(skip)
  .Take(query.Size);

queryable = baseQueryable
  .Join(subQuery,
    e => new { e.ExternalReference, e.Created },
    e => new { e.ExternalReference, e.Created },
    (e, _) => e)
  .OrderByDescending(q => q.Created);

var result = await queryable.ToListAsync();

What I find weird here is that the same query is produced and that it only causes problems on the last page of data.

Hope this helps in finding the problem.

c4rbon-c0py commented 5 years ago

We had the same problem as well. We recorded thousands of these errors with no real solution. To point out, we aren't on Azure SQL Server and we aren't using EF. As it turns out the issue to our problem was actually the web server.

We were able to correlate this to traffic because around around 10:30 - 11:00 am (on random days, omitting weekends because our traffic dramatically drops off during the weekend) we'd get these errors. As it turns our, we reviewed our GA reports and compared them to our error handling data and low and behold they matched fairly well (not perfectly though).

Our production web server was running about 8GB of RAM but this wasn't enough to compensate for the amount of traffic we were getting.

We doubled the memory and boom, all the issues went away no more MARS errors.

Again, this may not have any relation to anyone whose posted so far but I found this thread via a Google search so it might help someone whose going through this right now.

Hope it helps someone,

Cheers.

See new post for clarification.

EdonGar commented 5 years ago

This has become a major issue in my application. Using 2.2.6 of EntityFrameworkCore. When this error occurs, it is catastrophic and requires a full application domain reset and/or process killing.

Hosting in IIS on a Windows Server and backend is dedicated Sql Server 2016 running on separate VM.

This error occurs during lower traffic times as @c4rbon-c0py has mentioned.

This error can not be reproduce reliably, it is intermittent and apparently random, does not occur in one place, and affects all command paths from what I can see.

Turning on/off MARS on connection string does not change the occurrence of this exception.

I have NO IDEA WHAT TO DO ABOUT THIS. Please help.

CALL STACK:

System.Data.SqlClient.SqlException (0x80131904): 
The incoming tabular data stream (TDS) protocol stream is incorrect. The MARS TDS header contained errors.     
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)     
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)     
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)     
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)     
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()     at System.Data.SqlClient.SqlDataReader.get_MetaData()     
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)     
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)     
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite, String method)     
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)     
at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)     
at System.Data.Common.DbCommand.ExecuteReader()     at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.Execute(IRelationalConnection connection, DbCommandMethod executeMethod, IReadOnlyDictionary`2 parameterValues)     
at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.ExecuteReader(IRelationalConnection connection, IReadOnlyDictionary`2 parameterValues)     
at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.Enumerator.BufferlessMoveNext(DbContext _, Boolean buffer)     
at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)     
at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.Enumerator.MoveNext()     
at System.Linq.Enumerable.TryGetFirst[TSource](IEnumerable`1 source, Boolean& found)     
at lambda_method(Closure )     
at Microsoft.EntityFrameworkCore.Query.Internal.LinqOperatorProvider.ResultEnumerable`1.GetEnumerator()     
at Microsoft.EntityFrameworkCore.Query.Internal.LinqOperatorProvider._TrackEntities[TOut,TIn](IEnumerable`1 results, QueryContext queryContext, IList`1 entityTrackingInfos, IList`1 entityAccessors)+MoveNext()     
at Microsoft.EntityFrameworkCore.Query.Internal.LinqOperatorProvider.ExceptionInterceptor`1.EnumeratorExceptionInterceptor.MoveNext()     
at System.Linq.Enumerable.TryGetFirst[TSource](IEnumerable`1 source, Boolean& found)     at System.Linq.Enumerable.First[TSource](IEnumerable`1 source)     
at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.<>c__DisplayClass15_1`1.<CompileQueryCore>b__0(QueryContext qc)     
at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.Execute[TResult](Expression query)     
at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryProvider.Execute[TResult](Expression expression)     
at System.Linq.Queryable.FirstOrDefault[TSource](IQueryable`1 source, Expression`1 predicate)     
at [Application Call Stack - Varies and isn't consistent as to the source]  
ClientConnectionId:323d9c41-0c45-4ab1-9dc9-b2c201c7546e  Error Number:4011,State:3,Class:16
David-Engel commented 5 years ago

@avaneerd Can you package your sample code into a complete solution and share it? Not being terribly familiar with EF, it would take me some time to assemble that into something I could debug.

CC @divega

avaneerd commented 5 years ago

Here you go: https://github.com/avaneerd/mars-problem-repro Unfortunately I cannot seem to reproduce the problem in isolation... This was tested against an Azure SQL S2 database.

This matches our real-world scenario and I'm not sure why I can't reproduce it. I hope it contributes to finding the problem at least :-/

coolhome commented 5 years ago

I have experienced this issue twice in production over the last 6-8 months.

Our workaround is to perform an App Pool Recycle (restart does not work).

It seems like this occurs a day after server patching.

EDIT: Database VM: SQL Server 2014 SP3 Windows Server VM: ASP.NET Core 2.2.6 hosted within IIS out of process

c4rbon-c0py commented 5 years ago

I wanted to follow up with some additional information and to correct some bad/false advice I had originally given on my previous post.

Turns out that yes, throwing more resources at our server did help "hide" the problem but shortly after we added those resources, the MARS errors returned (given, with less frequency).

After some continued research on other threads I found the solution to our problem in our "async/await" code. Unfortunately I don't have the links to share but a few of them led me to this belief.

Now, I can't say for 100% fact that this fixed our problem since our web application is seasonal and at the moment our traffic is fairly low but I refactored all of our APIs and DAL to remove any reference to async/await code.

Just before our seasonally traffic dropped off I made these changes and it seemed to have help but again, I'm not truly 100% sure.

Since my original post I made this refactor about a week or 2 after and from that point until today we've had 0 MARS errors tracked.

So again, my apologies for throwing any off the trail on this.

Good luck,

Cheers!

divega commented 5 years ago

We are hitting this on our functional tests pretty consistently now. @David-Engel, @cheenamalhotra maybe you can run the EF Core functional tests as a way to diagnose this.

cc @Wraith2

Wraith2 commented 5 years ago

Any particular test or is it pseudo random?

ErikEJ commented 5 years ago

How do we run tests against an Azure SQL DB?

Wraith2 commented 5 years ago

Just use a connection string that points at an azure database and enable encryption. there's nothing special about azure at a protocol level (there is some extra connection resilience stuff for it but that's at socket level).

ErikEJ commented 5 years ago

Yes, I know that :smile: - I meant running the EF Core tests against Azure SQL DB

smitpatel commented 5 years ago

@Wraith2 - It is one particular test which always fails when running whole assembly for tests.

Wraith2 commented 5 years ago

can you point me at it?

smitpatel commented 5 years ago

https://github.com/aspnet/EntityFrameworkCore Branch: master (where the test is not skipped yet) Building steps https://github.com/aspnet/EntityFrameworkCore/wiki/getting-and-building-the-code Test name: Average_on_nav_subquery_in_projection Stacktrace & details: https://github.com/aspnet/EntityFrameworkCore/issues/17775 Running all of SqlServer.FunctionalTests causes error. Running just the test does not cause it. I have seen error in VS test runner and command line (build.cmd -test in repo root or dotnet test) both. It is pretty consistent that I have seen it 4 times in a row in a PR validation build.

Wraith2 commented 5 years ago

I can't get the repository building, lots of errors about duplicate attributes. I'm also in the same position as others here that don't really work with EF, I'm going to need to strip away the abstraction layer and get it into an sql client only reproduction. EF can't be doing anything at the protocol level to cause the issue so it seems safe to say if you can get me an sql only repro I can loop until it fails I'll investigate as best I can.

Do your test run and fail on a specific platform? The managed network implementation is entirely different to the windows one and a failure in one may not occur on the other. If it's windows only it might be an issue inside sni.dll that we can't fix here. if it's shared or unix only we can address it.

I also suspect that it's a failure of outgoing messages, we're sending malformed data at the server which is interpreting it as mars because that's what it looks like. The specific error message we're seeing is coming back from the server in the run loop so we're likely being sent the error and passing it back to the caller not having an error occur in the client network layer (I'd expect TdsParser or the state object to be on the stack in that case)

smitpatel commented 5 years ago

It happens on windows for sure. We are not sure about other platforms as we don't have testing for SqlServer on other platforms.

I can imagine wanting to strip out EF from picture since it won't be issue in EF. But at the same time, the EFCore runs more than 15,000 tests against SqlServer with variety of scenario. It would be hard to create such simulation without EF easily. So honestly I cannot say how to proceed with that. We would be willing to help out working with EF needed. cc: @divega

Wraith2 commented 5 years ago

Knowing it's on windows cuts out a lot of potential problem areas. Since it's one test that's failing intermittently can you try looping that one test and see if it fails reliably with just that test?

divega commented 5 years ago

@Wraith2 Re

I can't get the repository building, lots of errors about duplicate attributes. I'm also in the same position as others here that don't really work with EF, I'm going to need to strip away the abstraction layer and get it into an sql client only reproduction.

Maybe we can help you with that. Building EF Core can't be that hard. We do it every day :smile: