Open Abrissirba opened 1 week ago
Hi, @Abrissirba, I'm taking a look at this and will keep you updated on my findings. In the meantime, can you please let me know which version of Microsoft.Data.SqlClient you're using?
Thanks!
I used 5.2.2 of Microsoft.Data.SqlClient
Hi @Abrissirba, I wanted to share where I'm at to keep you updated. Your repro code helped me verify the issue (thank you!) and I've determined a root cause. I'm currently evaluating potential solutions and will share again when we're ready to go forward with a fix.
Thanks for the update and happy that my repro code helped.
We have an issue that have scared us and that has taken us some time to understand. I am not sure if this is an issue that belongs to SqlClient or System.Transaction or Azure SQL
The bug first appeared when moving our on-prem solution up to Azure. Going from windows to Azure Containers Apps running on linux and moving from SQL Server On-Prem to an Azure SQL database.
Our code is using TransactionScope to make sure everything is commited in one batch.
A regular flow is that at the begining of execution, a TransactionScope is created and record is inserted into a table (using ADO.NET) and then the flow continues to handle busniess logic. (Outbox with TransactionScope in NServicBus)
The business logic code use Entity Framework 6 to communicate with the DB.
At the end of execution the first connection will try to update the outbox record it created and complete the TransactionScope. Since we have two connecitons open simultanously the transaction will escalate to DTC. From my understanding Azure SQL will handle this DTC transaction in something called Elastic Transaction.
During our first tests we had a low SKU on the database and some parts of our db communication (heavy inserts) took very long time to execute. At most it could take up to 40 minutes.
During these long running transactions we got an error at then end when updating the outbox record. This would throw an error leading to the transaction being rolled back. So far so good.
What suprised us was that we could see data in the database that had been commited during this execution that should have been rolled back but instead had been commited. Scary and confusing!
After investigating the error, stating a broken connection, we found out that we used the proxy connection policy in azure. This connection policy seems to close connections that has been idle for 30 minutes. Redirect is the recomended policy by microsoft but proxy is the defualt in many scenarios, including when you use private endpoints which is the case for us. After changing to redirect, and increasing the SKU so that the inserts doesn't take so long to execute, we have not seen this behaviour any more.
However, we could not sleep as good as we would like knowing that this error might still occour, ending up corrupting the database.
So I started digging into the code to find a way to recreate this. This is what I have found so far
This only happens when we use an Azure SQL database. If we use a local db on windows, which will escalate using MSDTC, an error is thrown when the first connection/session dies.
I have been able to reproduce with the example below using the lastest version of both Microsoft.Data.SqlClient and System.Data.SqlClient against an Azure SQL db.
The code below can be used to recreate the issue.