OrchardCMS / Orchard

Orchard is a free, open source, community-focused Content Management System built on the ASP.NET MVC platform.
https://orchardproject.net
BSD 3-Clause "New" or "Revised" License
2.37k stars 1.12k forks source link

Since I moved to last 1.x ReliableSqlDbConnection begins to be less reliable #4566

Closed orchardbot closed 9 years ago

orchardbot commented 10 years ago

@csurieux created: https://orchard.codeplex.com/workitem/20737

My log is filling with this ? Background tasks are used for emails since months and this appears only recently ? Any idea why this Connection is not retried ? And if retried why no retry count in the logs ?

Trying to enumerate differences on settings since these logs started, I think I just activated the Workflow timer because in this version it has been pushed in its own feature. Before it was, included in workflow ... and working.

EDIT: there should be a config parameter to use SqlAzureClientDriverWithTimeoutRetries + SqlAzureClientDriverWithTimeoutRetries in place of SqlAzureClientDriver and another to adjust the retries policy 2014-05-31 03:54:14,748 [18] NHibernate.Transaction.AdoTransaction - Begin transaction failed System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) ---> System.ComponentModel.Win32Exception (0x80004005): An existing connection was forcibly closed by the remote host at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect) at System.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName) at NHibernate.SqlAzure.ReliableSqlDbConnection.BeginDbTransaction(IsolationLevel isolationLevel) at System.Data.Common.DbConnection.System.Data.IDbConnection.BeginTransaction(IsolationLevel isolationLevel) at NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) ClientConnectionId:ee585378-e41d-4517-ad13-d185a4da4fb1 2014-05-31 03:54:15,045 [18] Orchard.Tasks.BackgroundService - Error while processing background task NHibernate.TransactionException: Begin failed with SQL exception ---> System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) ---> System.ComponentModel.Win32Exception: An existing connection was forcibly closed by the remote host --- End of inner exception stack trace --- at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect) at System.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName) at NHibernate.SqlAzure.ReliableSqlDbConnection.BeginDbTransaction(IsolationLevel isolationLevel) at System.Data.Common.DbConnection.System.Data.IDbConnection.BeginTransaction(IsolationLevel isolationLevel) at NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) --- End of inner exception stack trace --- at NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) at Microsoft.Practices.TransientFaultHandling.RetryPolicy.<>cDisplayClass1.b0() at Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func) at NHibernate.Impl.SessionImpl.BeginTransaction(IsolationLevel isolationLevel) at Orchard.Data.SessionLocator.RequireNew(IsolationLevel level) at Orchard.Tasks.BackgroundService.Sweep()

orchardbot commented 10 years ago

@sebastienros commented:

Can you try to simply change BackgroundService to this:

    public class BackgroundService : IBackgroundService {
        private readonly IEnumerable<IBackgroundTask> _tasks;
        //private readonly ITransactionManager _transactionManager;

        public BackgroundService(
            IEnumerable<IBackgroundTask> tasks
            //, ITransactionManager transactionManager
            ) {
            _tasks = tasks;
            //_transactionManager = transactionManager;
            Logger = NullLogger.Instance;
        }

        public ILogger Logger { get; set; }

        public void Sweep() {
            foreach(var task in _tasks) {
                try {
                    //_transactionManager.RequireNew();
                    task.Sweep();
                }
                catch (Exception e) {
                    //_transactionManager.Cancel();
                    Logger.Error(e, "Error while processing background task");
                }

            }
        }
    }

This is what I did on weblogs.asp.net to remove these locks.

orchardbot commented 10 years ago

@csurieux commented:

I will try but it is a very strange way of behaving. We could thought it would have been better with a transaction for each task and dispose lost transac. I already tried multiple config: ReadUncommitted, no shim lock, no cancel but still commit/ dispose, flush after any table create or delete... The SQL side shows one more session than we should have and this session is totally inactive, Keeping only the locked records with various origins: Select or insert or update. I was investigating on a double begin transaction with only one commit....

orchardbot commented 10 years ago

@sebastienros commented:

I am just trying to isolate where the issue is. I am not saying this is what we want. If it doesn't repro after this change then we know where the issue is, repro it, and fix it.

orchardbot commented 10 years ago

@csurieux commented:

Ok. So I tryed and stiil the lock. (was nor sure but thinking I have already done this)

BackgroundService Seep loops through WarmupTask.Sweep TimerBackgroundTask.Sweep JobsQueueBackgroundTask.Sweep -> JobsQueueProcessor.ProcessQueue which blocks on lock IndexingBackgroundTask.Sweep ScheduledTaskExecutor.Seep -> which loops also_transactionManager.RequireNew() if any entry

The lock appears after this sequence:

Here I noticed from sql server that the lock is set by another session as BackgroundService the running one and this session, until I kill it stays without have no action (the current stored proc from detail doesn't change...)

I had not the opportunity to try finding in which store procedure is exactly created the lock (it could also be a meta data lock created in the prepare statement ?). it is a long process using the SQL Profiler and trace analysis I am no more used to.

When I kill the dead session holding the lock everything is Ok.

I would add that I use a similar scheduler in a non Orchard Azure Cloud Service with direct ADO.NET (thanks Dr NoNH), no transaction in code, but transactions in store procs (better control on locking inside each transac) , and I never encouter this lock pb, and with a very higher load. One last difference is that this scheduler, on the end of each job do not its record in DB but mark it as executed using a store proc (no lock).

If Harvest, the World Cup and MS let you some time you may lync me on my email (preferably on afternoon but before WWC). I will try to open my mic.

orchardbot commented 10 years ago

@csurieux commented:

!!!! Editor -> from many typos: One last difference is that this scheduler, on the end of each job do not DELETE its record in DB but mark it as executed using a store proc (no lock).

orchardbot commented 10 years ago

@csurieux commented:

I tried not deleting but setting priority to -1 and selecting only >=0, no change.

orchardbot commented 10 years ago

@csurieux commented:

Anyway this last pb is different from the one with SQL Azure, with SQL Azure the exception get thrown in the _transactionManager.RequireNew() , not in Sweep. As I was saying there are several problems with background tasks.

orchardbot commented 10 years ago

@csurieux commented:

I gave up nad pushed all jobQueue tasks out of orchard, keeping only the SQL 4.5.1 update and it looks better on the timeout exception front line -> 0

orchardbot commented 10 years ago

@csurieux commented:

Since I migrated to 4.5.1 using the new SqlConnection with retry count I noticed this new message. It may suggest that Orchard is not closing correctly connections. I will try to increase Connection Delay and Max Pool Size in connection string to avois it.

2014-07-03 02:09:07,694 [18] NHibernate.Transaction.AdoTransaction - Begin transaction failed
System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at NHibernate.Sql451.Sql451ConnectionProvider.GetConnection()
   at NHibernate.AdoNet.ConnectionManager.GetConnection()
   at NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel)
orchardbot commented 9 years ago

@sebastienros commented:

Fixed in changeset f2a7bc59da9c2edd7c745f114395d76f00bd6ff5

orchardbot commented 9 years ago

@csurieux commented:

Does the SqlConnection called by NH will be from 4.5.1 ?

orchardbot commented 9 years ago

@sebastienros commented:

Following the article you mentioned, I simply removed any specific code as it is handled "magically" by the new SqlConnection class. So this will just work out of the box with the default parameters (10 seconds and 1 retry),

orchardbot commented 9 years ago

@csurieux commented:

As the orchard guru and dictator, not to forget mentioning you are from MS deep inside, this should be ok.

In my tests I was using an NHibernate dll which was compiled with and so always using the old version of ado.net, always present on the deplyed servers for compat, and the retry thing was not working. This is the reason why, to be sure to use the 4.5.1 version I moved the driver dedicated class into orchard.