dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.73k stars 3.18k forks source link

Issue with RollbackToSavepointAsync not working after a DbCommand failure due to Geography column (NetTopologySuite) #24818

Closed jondmcelroy closed 2 years ago

jondmcelroy commented 3 years ago

File a bug

We are wrapping calls to the DbContext SaveChangesAsync with our own transaction, savepoint and rollback. We noticed that when a command failure happened the rollback would not work due to the system thinking that transaction was no longer available.

internal class ReadWriteUnitOfWork : UnitOfWork, IReadWriteUnitOfWork
    {
        private const string SavepointName = nameof(ReadWriteUnitOfWork) + "Savepoint";
        private readonly IEntityCacheFacade _entityCache;
        private IDbContextTransaction? _transaction;
        private bool _disposed;

        public ReadWriteUnitOfWork(
            UnitOfWorkContext unitOfWorkContext,
            DbContext dbContext,
            IEntityCacheFacade entityCache)
            : base(unitOfWorkContext, dbContext)
        {
            _entityCache = entityCache;
        }

        public async Task SaveChangesAsync()
        {
            var dbContext = DbContext;
            _transaction ??= await dbContext.Database.BeginTransactionAsync();

            await _transaction.CreateSavepointAsync(SavepointName);
            try
            {
                await dbContext.SaveChangesAsync(false);
                await _entityCache.UpdateEntriesAsync(dbContext.ChangeTracker.Entries());
            }
            catch
            {
                await _transaction.RollbackToSavepointAsync(SavepointName);
                throw;
            }
            finally
            {
                await _transaction.ReleaseSavepointAsync(SavepointName);
            }

            // Accept changes after successfully updating cache.
            dbContext.ChangeTracker.AcceptAllChanges();
        }

   ....
}

Application insights show the timeline when we are attempting to save changes on a known sql command failure: image You can see that both the Save Transactions (the one we are doing and the one that efcore is doing) work properly. Then the sql db command fails. Following that both of the transaction rollbacks fail with this error: The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.

When I attempt to dive into it, I noticed that when Creating the Savepoint the _dbTransaction in the RelationalTransction has both a set Connection and a InternalTransaction object.

image

After it calls SaveChangesAsync when I look at that same RelationTransaction, the fields on the _dbTransation have been set to null.

image

This happens to the __EFSavePoint as well as the one we have wrapped around it.

Has anyone else experienced this problem or have any ideas about what could cause it?

SQLServer Multiple Active Result Sets is not turned on.

Include provider and version information

EF Core version: 5.0.1 Database provider: Microsoft.EntityFrameworkCore.SqlServer - 5.0.1 Target framework: .NET 5.0 Operating system: WINDOWS 10 IDE: Rider 2020.3

roji commented 3 years ago

It's difficult to understand from the code above what exactly your issue is and how it can be reproduced - I've created a minimal sample with an user savepoint wrapping SaveChanges in addition to the internal EF one, and it seems to work fine. Can you please tweak my code sample below to show what's not working?

Attempted repro ```c# await using var ctx = new BlogContext(); await ctx.Database.EnsureDeletedAsync(); await ctx.Database.EnsureCreatedAsync(); var transaction = await ctx.Database.BeginTransactionAsync(); await transaction.CreateSavepointAsync("MySavepoint"); var blog = await ctx.Blogs.SingleAsync(); // Concurrent change to make the main change fail await using (var otherCtx = new BlogContext()) { var otherBlog = await otherCtx.Blogs.SingleAsync(); otherBlog.Name = "changed"; await otherCtx.SaveChangesAsync(); } blog.Name = "failed_changed"; try { await ctx.SaveChangesAsync(); } catch (DbUpdateConcurrencyException) { await transaction.RollbackToSavepointAsync("MySavepoint"); await transaction.RollbackAsync(); Console.WriteLine("Concurrent update exception occurred"); return; } throw new Exception("Concurrent update exception did not occur"); public class BlogContext : DbContext { public DbSet Blogs { get; set; } static ILoggerFactory ContextLoggerFactory => LoggerFactory.Create(b => b.AddConsole() .AddFilter("", LogLevel.Information) .AddFilter(DbLoggerCategory.Database.Transaction.Name, LogLevel.Debug)); protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) => optionsBuilder .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0") .EnableSensitiveDataLogging() .UseLoggerFactory(ContextLoggerFactory); protected override void OnModelCreating(ModelBuilder modelBuilder) => modelBuilder.Entity().HasData(new Blog { Id = 1, Name = "orig" }); } public class Blog { public int Id { get; set; } public string Name { get; set; } [Timestamp] public byte[] Rowversion { get; set; } } ```
jondmcelroy commented 3 years ago

@roji Thanks for looking at this. I was able to reproduce the issue when using a db table that contained a geography type.

Need to add <PackageReference Include="NetTopologySuite.Core" Version="1.15.3" /> to your test csproj.

The exception that is thrown on SaveChangesAsync is due to the invalid coordinates:

Microsoft.Data.SqlClient.SqlException (0x80131904): The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 2 ("@p0"): The supplied value is not a valid instance of data type geography. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision.

Something about this exception is causing this problem. I am not sure if this is an efcore bug or a nettopologysuite bug at this point.

using System;
using System.ComponentModel.DataAnnotations;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;
using NetTopologySuite.Geometries;

namespace TestEFCoreSavePoint
{
    class Program
    {
        static async Task Main(string[] args)
        {
            await using var ctx = new BlogContext();
            await ctx.Database.EnsureDeletedAsync();
            await ctx.Database.EnsureCreatedAsync();

            var transaction = await ctx.Database.BeginTransactionAsync();
            await transaction.CreateSavepointAsync("MySavepoint");

            var blog = await ctx.Blogs.SingleAsync();

            blog.GeographicCoordinates = new Point(new Coordinate(20, 100)) { SRID = BlogContext.Srid };

            try
            {
                await ctx.SaveChangesAsync();
            }
            catch (Exception)
            {
                await transaction.RollbackToSavepointAsync("MySavepoint");
                await transaction.RollbackAsync();
                Console.WriteLine("Concurrent update exception occurred");
                return;
            }

            throw new Exception("Concurrent update exception did not occur");
        }
    }

    public class BlogContext : DbContext
    {

        public const int Srid = 4326;
        public DbSet<Blog> Blogs { get; set; }

        static ILoggerFactory ContextLoggerFactory
            => LoggerFactory.Create(b => b.AddConsole()
                .AddFilter("", LogLevel.Information)
                .AddFilter(DbLoggerCategory.Database.Transaction.Name, LogLevel.Debug));

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
            => optionsBuilder
                .UseSqlServer(
                    @"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0",
                    x => x.UseNetTopologySuite()
                )
                .EnableSensitiveDataLogging()
                .UseLoggerFactory(ContextLoggerFactory);

        protected override void OnModelCreating(ModelBuilder modelBuilder)
            => modelBuilder.Entity<Blog>()
                .HasData(
                    new Blog { Id = 1, Name = "orig", GeographicCoordinates = new Point(new Coordinate(20, 90)) { SRID = Srid } }
                );
    }

    public class Blog
    {
        public int Id { get; set; }

        public string Name { get; set; }

        [Timestamp]
        public byte[] Rowversion { get; set; }

        public Point GeographicCoordinates { get; set; } = null!;
    }
}
roji commented 3 years ago

Thanks for the code repro, I can see the problem happening now.

First, the specific error you're seeing seems to be the same as #19416; switching your two coordinates around makes it go away (new Coordinate(100, 20)).

Now, it seems that this specific exception implicitly terminates the transaction in SQL Server... I've verified this with a raw SqlClient test, without EF being involved in any way (see the code below). You should probably wrap your RollbackToSavepointAsync call in its own try/catch - this is a good idea regardless of this scenario, since any exception coming out of that would bubble up and hide the original exception that caused the rollback to get triggered.

Raw SqlClient test Insert the following in the csproj: ```xml ``` Program code: ```c# await using var conn = new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0"); await conn.OpenAsync(); await using var cmd = conn.CreateCommand(); cmd.CommandText = @"IF OBJECT_ID('dbo.Blogs', 'U') IS NOT NULL DROP TABLE [Blogs];"; await cmd.ExecuteNonQueryAsync(); cmd.CommandText = @"CREATE TABLE [Blogs] ([Id] int NOT NULL IDENTITY, [GeographicCoordinates] geography NULL)"; await cmd.ExecuteNonQueryAsync(); var tx = conn.BeginTransaction(); tx.Save("MySavepoint"); cmd.CommandText = "INSERT INTO [Blogs] ([GeographicCoordinates]) VALUES (@p)"; cmd.Transaction = tx; var value = new Point(new Coordinate(20, 100)) { SRID = 4326 }; var writer = new SqlServerBytesWriter { IsGeography = true }; cmd.Parameters.Add(new SqlParameter { ParameterName = "p", SqlDbType = SqlDbType.Udt, UdtTypeName = "geography", Value = new SqlBytes(writer.Write(value)) }); try { await cmd.ExecuteNonQueryAsync(); } catch (SqlException) { Console.WriteLine("Caught expected exception"); } try { tx.Rollback("MySavepoint"); } catch (SqlException) { Console.WriteLine("Failed to roll back savepoints"); } tx.Rollback(); ```
ajcvickers commented 3 years ago

Note from triage: we may not be able to do anything here, but we should check that there isn't anything we can do from EF code to make this better.

roji commented 3 years ago

The main difficulty is that I don't think ADO.NET / SqlClient exposes information on whether a transaction is still (or currently in progress). There may some information in the error code on SqlException, we can take a look.

roji commented 3 years ago

@cheenamalhotra @David-Engel @Wraith2 would it be possible to have your input on this?

In a nutshell, the repro code causes a legitimate exception with Number=8023, State=4, Class/Severity=16 - all is well and good. The problem is that it also causes the transaction to be terminated; an attempt to roll back to a savepoint from before the offendnig operation throws the exception "This SqlTransaction has completed; it is no longer usable".

I can't see anything in the docs or on the SqlClient API that exposes the fact that this exception terminates the exception - this information is useful in order to correctly react to it. For example, I can see in the docs that Severity 20 and above usually terminates the connection, but no similar information is given on the status of the transaction. In the same way, there doesn't seem to be a way to check if the SqlTransaction instance is still active.

Any ideas or input on this? Does it make sense to escalate the transaction termination to the SQL Server team, maybe this isn't reasonable behavior?

cheenamalhotra commented 3 years ago

@roji

Thanks for the repro 👍🏻 I was able to capture server traces and there seems to be an exception occurring when RPC is executed. This is what triggers the transaction termination.

image

It could be a driver issue, please transfer this to SqlClient repository and we will investigate further.

cheenamalhotra commented 3 years ago

This is the complete error:

The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 1 ("@p"): The supplied value is not a valid instance of data type geography. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision.

I'm still not 100% sure if this is SqlClient issue and wondering if this is due to geography type issue? Could it be related to value generated by : var value = new Point(new Coordinate(20, 100)) { SRID = 4326 }; is not correct?

roji commented 3 years ago

@cheenamalhotra I'm not convinced it's actually a SqlClient error here: as you wrote, this is related to the geography type being written as raw bytes, probably in an incorrect way (if you switch the parameters around things are OK). But even if that's the case, it's odd for the transaction to simply disappear without a trace (and for the connection to stay usable) - that's the larger question here from my perspective.

Let me know how you want to proceed, I can transfer this to SqlClient if you want.

cheenamalhotra commented 3 years ago

Well actually if you fetch "tx.SupportsSavepoints" it says "False".

roji commented 3 years ago

(got back to this after a while)

Well actually if you fetch "tx.SupportsSavepoints" it says "False".

@cheenamalhotra that's true, but isn't that because it always returns false? :) AFAIK SqlClient hasn't overridden that property, which was introduced in .NET 5.0 as part of https://github.com/dotnet/runtime/issues/33397. I couldn't find an issue tracking that, so I opened https://github.com/dotnet/SqlClient/issues/1256.

Stepping back... So far, the only scenario we know about where this happens is the above, which is the result of a programmer bug, rather than an actual exception that is expected to happen at runtime. I think it's OK to not do anything until we have a more relevant runtime repro - clearing milestone to bring to triage on the EF side.

ajcvickers commented 3 years ago

Note from triage: no-action on the EF side for this case, since the exception is caused by bad binary data. However, it does seem like this may be an issue in other places (and other providers) if the transaction is reset by an exception.