OrchardCMS / Orchard

Orchard is a free, open source, community-focused Content Management System built on the ASP.NET MVC platform.
https://orchardproject.net
BSD 3-Clause "New" or "Revised" License
2.37k stars 1.12k forks source link

Since I moved to last 1.x ReliableSqlDbConnection begins to be less reliable #4566

Closed orchardbot closed 9 years ago

orchardbot commented 10 years ago

@csurieux created: https://orchard.codeplex.com/workitem/20737

My log is filling with this ? Background tasks are used for emails since months and this appears only recently ? Any idea why this Connection is not retried ? And if retried why no retry count in the logs ?

Trying to enumerate differences on settings since these logs started, I think I just activated the Workflow timer because in this version it has been pushed in its own feature. Before it was, included in workflow ... and working.

EDIT: there should be a config parameter to use SqlAzureClientDriverWithTimeoutRetries + SqlAzureClientDriverWithTimeoutRetries in place of SqlAzureClientDriver and another to adjust the retries policy 2014-05-31 03:54:14,748 [18] NHibernate.Transaction.AdoTransaction - Begin transaction failed System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) ---> System.ComponentModel.Win32Exception (0x80004005): An existing connection was forcibly closed by the remote host at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect) at System.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName) at NHibernate.SqlAzure.ReliableSqlDbConnection.BeginDbTransaction(IsolationLevel isolationLevel) at System.Data.Common.DbConnection.System.Data.IDbConnection.BeginTransaction(IsolationLevel isolationLevel) at NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) ClientConnectionId:ee585378-e41d-4517-ad13-d185a4da4fb1 2014-05-31 03:54:15,045 [18] Orchard.Tasks.BackgroundService - Error while processing background task NHibernate.TransactionException: Begin failed with SQL exception ---> System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) ---> System.ComponentModel.Win32Exception: An existing connection was forcibly closed by the remote host --- End of inner exception stack trace --- at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect) at System.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName) at NHibernate.SqlAzure.ReliableSqlDbConnection.BeginDbTransaction(IsolationLevel isolationLevel) at System.Data.Common.DbConnection.System.Data.IDbConnection.BeginTransaction(IsolationLevel isolationLevel) at NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) --- End of inner exception stack trace --- at NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) at Microsoft.Practices.TransientFaultHandling.RetryPolicy.<>cDisplayClass1.b0() at Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func) at NHibernate.Impl.SessionImpl.BeginTransaction(IsolationLevel isolationLevel) at Orchard.Data.SessionLocator.RequireNew(IsolationLevel level) at Orchard.Tasks.BackgroundService.Sweep()

orchardbot commented 10 years ago

@Piedone commented:

How many of these you see? For us this happens regularly, about one-two times a day for 7 constantly running tenants. I think this is, although uncomfortable, normal: Azure SQL isn't 100% reliable and if we don't have any retry logic for our failing queries then ReliableSqlDbConnection can only help that much.

orchardbot commented 10 years ago

@csurieux commented:

I don't think it is SQL Azure which is unreliable, I think the pb is on Orchard or NH because I have other services, not Orchard based, but with many transactions and never had the timeout exception. So I have patched orchard to use SqlAzureClientDriverWithTimeoutRetries and related config to see if there is any improvement. There is also the possibility of a name resolution pb which I encounter from time to time in azure...this is an azure pb. Concerning your SQL azure which version are you using? Web or Business ?

orchardbot commented 10 years ago

@Piedone commented:

Maybe your other site doesn't ping the DB that frequently so you just don't notice if there's an issue?

I experience this equally with Web and Business editions.

orchardbot commented 10 years ago

@csurieux commented:

It seems to ahve solved my pb, no more logs today. Waiting confirmation on next days.

        public override IPersistenceConfigurer GetPersistenceConfigurer(bool createDatabase) {

            var persistence = MsSqlConfiguration.MsSql2008;
            if (string.IsNullOrEmpty(_connectionString)) {
                throw new ArgumentException("The connection string is empty");
            }

            persistence = persistence.ConnectionString(_connectionString);

            // when using Sql Server Azure, use a specific driver, c.f. https://orchard.codeplex.com/workitem/19315
            if (IsAzureSql()) {
                // patch CS persistence = persistence.Driver<SqlAzureClientDriver>();
                persistence = persistence.Driver<SqlAzureClientDriverWithTimeoutRetries>(); 
            }

            // use MsSql2012Dialect if on Azure or if specified in the connection string
            if (IsAzureSql() || _connectionString.IndexOf(";Dialect=MsSql2012Dialect", StringComparison.OrdinalIgnoreCase) > 0) {
                persistence = persistence.Dialect<NHibernate.Dialect.MsSql2012Dialect>();
            }

            return persistence;
        }

        protected override void AlterConfiguration(Configuration config) {
            config.SetProperty(NHibernate.Cfg.Environment.PrepareSql, Boolean.TrueString);

            if (IsAzureSql()) {
                // patch CS config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(ReliableAdoNetWithDistributedTransactionFactory).AssemblyQualifiedName);
                config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(SqlAzureClientDriverWithTimeoutRetries).AssemblyQualifiedName);
            }
        }
orchardbot commented 10 years ago

@Piedone commented:

Interesting. We'll also try out and see how it works.

orchardbot commented 10 years ago

@Piedone commented:

This isn't working:

NHibernate.HibernateException: could not instantiate TransactionFactory: NHibernate.SqlAzure.SqlAzureClientDriverWithTimeoutRetries, NHibernate.SqlAzure, Version=1.0.0.37, Culture=neutral, PublicKeyToken=bb07ae0f952b17ee ---> System.InvalidCastException: Unable to cast object of type 'NHibernate.SqlAzure.SqlAzureClientDriverWithTimeoutRetries' to type 'NHibernate.Transaction.ITransactionFactory'.

The second modification should not be made, so only applying the first one is sufficient. At least it works, not sure yet if it really makes SQL connections more reliable, will turn out in the longer run.

orchardbot commented 10 years ago

@csurieux commented:

I don't know what you did but I have not this error...

The code I have proposed compiles and runs without errors.

Maybe you have not set it correctly.

Next, I am not in multi tenants and maybe your tenants have not been all updated. Haven't you told it was one time a day? How could you have already results.

On my side it's the second day without logs about this, it works for me.

orchardbot commented 10 years ago

@Piedone commented:

As you can see this is a runtime error and the issue is that SqlAzureClientDriverWithTimeoutRetries is not a TransactionStrategy. This error comes up when the site first tries to connect to an SQL Azure DB, no tenants can start. This error coming up has nothing to do with connections being dropped roughly once a day.

orchardbot commented 10 years ago

@jeffolmstead commented:

Well I am game for giving it a try as these "errors" fill my log file all too quickly. I will only try the first setting (the one in the GetPersistenceConfigurer method and leave out the AlterConfiguration method). Will let you know how it turns out.

One thing though, you mentioned a "related config". Is there a config file associated with the persistence connection (i.e. how many retries are attempted, etc.)? If so, mind sharing where this file is located?

orchardbot commented 10 years ago

@csurieux commented:

I don't know why you don't implement the AlterConfiguration, this is the recommanded implementation and it works perfect for me. No error as Zoltan reported. And no more error due to timeout. Concerning how many retries, no there is actually nothing in orchard fot this.

orchardbot commented 10 years ago

@Piedone commented:

Again I experienced the same error so adding the first config is not enough (and as mentioned, the second throws an error).

orchardbot commented 10 years ago

@jeffolmstead commented:

I have checked my logs this morning and I do still have some instances of transport-level error (two to be exact). Now, I say that tongue in cheek because it also appears to be less than normal. In addition, I monitor the number of sessions open on any one database and I am tracking less open sessions since the change last evening. Time will tell on that one though as it could be just normal cycling (my theory being that the faster sessions open and close the better the overall performance and less likelihood that Azure throttling has to force the sessions closed).

Now, in regards to why I didn't implement the AlterConfiguration, I could not push that onto a live environment without further testing. I am testing it out locally though and, so far, have not had any errors associated with it being there. My big concern is that:

  1. I don't know when "AlterConfiguration" get's called to even know if my change was utilized (i.e. since Zoltan is experiencing a runtime error it is not observable on compilation)
  2. The runtime error provided by Zoltán is very legitimate as SqlAzureClientDriverWithTimeoutRetries does NOT implement (at least not visible in reflection) ITransactionFactory so I am trying to find where this stems from (wondering if it has to do with multi-tenant which is a configuration you, Christian, mentioned you don't have implemented though Zoltán does).

Can you keep us informed on what you observe over the next coming days? I may try a deployment with the AlterConfiguration change in place to see what occurs.

orchardbot commented 10 years ago

@csurieux commented:

Very strange, I get no error ??? But I must admit that the code I bring above seems weird? I coming back to it I see no reason it should work ????

May be the error in actual code is in config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(ReliableAdoNetWithDistributedTransactionFactory).AssemblyQualifiedName);

and my code works because it doesn't contains it.

Reading back https://github.com/MRCollective/NHibernate.SqlAzure the correct line for StrategyFactory should be config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(ReliableAdoNetWithDistributedTransactionFactory).AssemblyQualifiedName);

and to answer Jeff we should be in position to implement the retry counts from the web.config file according http://msdn.microsoft.com/en-us/library/hh680900.aspx and something as their sample

<RetryPolicyConfiguration defaultRetryStrategy="Fixed Interval Retry Strategy"
    defaultSqlConnectionRetryStrategy="Backoff Retry Strategy"
    defaultSqlCommandRetryStrategy="Incremental Retry Strategy"
    defaultAzureStorageRetryStrategy="Fixed Interval Retry Strategy"
    defaultAzureServiceBusRetryStrategy="Fixed Interval Retry Strategy">
    <incremental name="Incremental Retry Strategy" retryIncrement="00:00:01"
        retryInterval="00:00:01" maxRetryCount="10" />
    <fixedInterval name="Fixed Interval Retry Strategy" retryInterval="00:00:01"
        maxRetryCount="10" />
    <exponentialBackoff name="Backoff Retry Strategy" minBackoff="00:00:01"
        maxBackoff="00:00:30" deltaBackoff="00:00:10" maxRetryCount="10" 
        fastFirstRetry="false"/>
</RetryPolicyConfiguration>

but have not tried.

orchardbot commented 10 years ago

@csurieux commented:

My ultimate question : 'Isn't the Transient Library which introduces pbs with sql Azure ???'

orchardbot commented 10 years ago

@Piedone commented:

@CSADNT: could you please post the full code for this class that is deployed for you and actually working?

orchardbot commented 10 years ago

@jeffolmstead commented:

Christian, I would love to be in control of the retry strategy, especially if it was tenant specific (as some of the configs are able to be using a config override). That would be an excellent improvement.

And I agree with Zoltán, it would be fantastic to see exactly what code you have in place which is working.

orchardbot commented 10 years ago

@csurieux commented:

Back to my code I am no more sure that it does always instanciate reliable connections because its configuration is not set for (I bring a bad method and NH should fail silently when setting up its config ... as it often does...I have not turned the full logs that are crazy). And I don't get the previous errors: in 3 days no timeout errors and the sites are running Ok ??? This is the reason why I am suspecting the Transient MS library not working under NH...

Last point is that I am not using multitenants .... :)

For my code, juts take the last 1.8, I copy it here, but I am now convinced thta this code is NOK...but solving the Pb (and maybe bringing other). I will add logs/trycatch in AlterConfiguration

using System;
using FluentNHibernate.Cfg.Db;
using NHibernate.Cfg;
using NHibernate.SqlAzure;

namespace Orchard.Data.Providers {
    public class SqlServerDataServicesProvider : AbstractDataServicesProvider {
        private readonly string _dataFolder;
        private readonly string _connectionString;

        public SqlServerDataServicesProvider(string dataFolder, string connectionString) {
            _dataFolder = dataFolder;
            _connectionString = connectionString;
        }

        public static string ProviderName {
            get { return "SqlServer"; }
        }

        public override IPersistenceConfigurer GetPersistenceConfigurer(bool createDatabase) {

            var persistence = MsSqlConfiguration.MsSql2008;
            if (string.IsNullOrEmpty(_connectionString)) {
                throw new ArgumentException("The connection string is empty");
            }

            persistence = persistence.ConnectionString(_connectionString);

            // when using Sql Server Azure, use a specific driver, c.f. https://orchard.codeplex.com/workitem/19315
            if (IsAzureSql()) {
                // patch CS persistence = persistence.Driver<SqlAzureClientDriver>();
                persistence = persistence.Driver<SqlAzureClientDriverWithTimeoutRetries>(); 
            }

            // use MsSql2012Dialect if on Azure or if specified in the connection string
            if (IsAzureSql() || _connectionString.IndexOf(";Dialect=MsSql2012Dialect", StringComparison.OrdinalIgnoreCase) > 0) {
                persistence = persistence.Dialect<NHibernate.Dialect.MsSql2012Dialect>();
            }

            return persistence;
        }

        protected override void AlterConfiguration(Configuration config) {
            config.SetProperty(NHibernate.Cfg.Environment.PrepareSql, Boolean.TrueString);

            if (IsAzureSql()) {
                // patch CS config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(ReliableAdoNetWithDistributedTransactionFactory).AssemblyQualifiedName);
                config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(SqlAzureClientDriverWithTimeoutRetries).AssemblyQualifiedName);
            }
        }

        private bool IsAzureSql() {
            return _connectionString.ToLowerInvariant().Contains("database.windows.net");
        }
    }
}
orchardbot commented 10 years ago

@csurieux commented:

I don't understand why NH calls TransactionStrategy a constant containing 'transaction.factory_class', it seems inadapted.

orchardbot commented 10 years ago

@csurieux commented:

REading NH and NH.SQLAzure codes, I think our problems come from ReliableAdoTransaction which inherits from AdoTransaction but the following method is badly implemented:

        public static void ExecuteWithRetry(ReliableSqlDbConnection connection, System.Action action)
        {
            connection.ReliableConnection.CommandRetryPolicy.ExecuteAction(() =>
                {
                    if (connection.State != ConnectionState.Open)
                        connection.Open();

                    action();
                }
            );
        }

especially the test on (connection.State != ConnectionState.Open)

orchardbot commented 10 years ago

@csurieux commented:

My conclusion is that Orchard should not automatically jump to the NHibernate.SqlAzure assembly when detecting SQL Azure Connection string, we should have the option not taking it or taking another library.

I know I was requiring to use Transient code for SQL Azure 1 year ago.... :)

orchardbot commented 10 years ago

@csurieux commented:

Moreover it seems that orchard includes the old version of library 1.0.037, not adapted form .net 4.5

orchardbot commented 10 years ago

@sebastienros commented:

My conclusion is that if we can provide a solution which works then we should ;)

orchardbot commented 10 years ago

@csurieux commented:

And now MS also offers Oracle under Azure .... :)

orchardbot commented 10 years ago

@jeffolmstead commented:

All I want is a database that doesn't drop connections but is not so aggressive where I cannot terminate a tenant: Orchard Tenants Gone Wild - How to Terminate and Keep Server Up Alas, I don't have the solution (though I hope the solution isn't Oracle :) ). It is likely I had less issues when I wasn't on Azure SQL but then there were other issues too. Someday there will be a perfect CMS - Orchard of course.

orchardbot commented 10 years ago

@csurieux commented:

Have you tried rippig the code which put NH.SQLAzure inplace of normal SQLDrriver ? I think it is the problem.

orchardbot commented 10 years ago

@csurieux commented:

But I also think going cultivate my garden or drive my car.

orchardbot commented 10 years ago

@jeffolmstead commented:

I haven't tried that one yet, though I will be. I try all of these things as it is my big pain point these days (always will be one I suppose). Funny, I was thinking ten minutes ago that I can't wait to get outside today and pull weeds! Orchard really pushes a person to gardening. Enjoy your time cultivating.

orchardbot commented 10 years ago

@csurieux commented:

I have put more logs on CreateTransaction and it appears that on each request Orchard creates a new transaction, installs and opens a connection, then immediately commit this transaction because there is a new CreateTransaction immediately after ???? As I have sold my car and garden I will walk in the streets thinking to this.

orchardbot commented 10 years ago

@csurieux commented:

Ok, it seems to work, no connection timeout for last 12 hours on 2 servers, new code used for tracing

using System;
using FluentNHibernate.Cfg.Db;
using NHibernate.Cfg;
using NHibernate.SqlAzure;

namespace Orchard.Data.Providers
{

    #region DBG CS
    using NHibernate.Engine;
    using NHibernate;
    using NHibernate.Engine.Transaction;
    using Orchard.Logging;
    using NHibernate.Transaction;
    public class ReliableAdoNetWithDistributedTransactionFactory2 : ReliableAdoNetWithDistributedTransactionFactory, ITransactionFactory
    {
        private IInternalLogger Logger = LoggerProvider.LoggerFor(typeof(ReliableAdoNetWithDistributedTransactionFactory2));
        public ReliableAdoNetWithDistributedTransactionFactory2()
        {
        }

        public new ITransaction CreateTransaction(ISessionImplementor session)
        {
            Logger.Debug("CreateTransaction");
            ITransaction newTransac = null;
            try
            {
                newTransac          = base.CreateTransaction(session);
            }
            catch (Exception ex)
            {
                Logger.Error("CreateTransaction failed", ex);
                throw;
            }
            return newTransac;
        }

        /// <summary>
        /// Executes some work in isolation.
        /// </summary>
        /// <param name="session">The NHibernate session</param>
        /// <param name="work">The work to execute</param>
        /// <param name="transacted">Whether or not to wrap the work in a transaction</param>
        public new void ExecuteWorkInIsolation(ISessionImplementor session, IIsolatedWork work, bool transacted)
        {
            Logger.Debug("ExecuteWorkInIsolation BEG");
            base.ExecuteWorkInIsolation(session, work, transacted);
            Logger.Debug("ExecuteWorkInIsolation BEG");
        }
    }

    #endregion

    public class SqlServerDataServicesProvider : AbstractDataServicesProvider {
        private readonly string _dataFolder;
        private readonly string _connectionString;

        public SqlServerDataServicesProvider(string dataFolder, string connectionString) {
            _dataFolder = dataFolder;
            _connectionString = connectionString;
        }

        public static string ProviderName {
            get { return "SqlServer"; }
        }

        public override IPersistenceConfigurer GetPersistenceConfigurer(bool createDatabase) {

            var persistence = MsSqlConfiguration.MsSql2008;
            if (string.IsNullOrEmpty(_connectionString)) {
                throw new ArgumentException("The connection string is empty");
            }

            persistence = persistence.ConnectionString(_connectionString);

            // when using Sql Server Azure, use a specific driver, c.f. https://orchard.codeplex.com/workitem/19315
            if (IsAzureSql()) {
                // patch CS persistence = persistence.Driver<SqlAzureClientDriver>();
                persistence = persistence.Driver<SqlAzureClientDriverWithTimeoutRetries>(); 
            }

            // use MsSql2012Dialect if on Azure or if specified in the connection string
            if (IsAzureSql() || _connectionString.IndexOf(";Dialect=MsSql2012Dialect", StringComparison.OrdinalIgnoreCase) > 0) {
                persistence = persistence.Dialect<NHibernate.Dialect.MsSql2012Dialect>();
            }

            return persistence;
        }

        protected override void AlterConfiguration(Configuration config) {
            config.SetProperty(NHibernate.Cfg.Environment.PrepareSql, Boolean.TrueString);

            if (IsAzureSql()) {                
                // patch CS config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(ReliableAdoNetWithDistributedTransactionFactory).AssemblyQualifiedName);
                try
                {
                    config.SetProperty(NHibernate.Cfg.Environment.TransactionStrategy, typeof(ReliableAdoNetWithDistributedTransactionFactory2).AssemblyQualifiedName);
                }
                catch(Exception ex)
                {
                    Logger.Error(ex,"AlterConfiguration Error");
                }
                Logger.Debug("AlterConfiguration IsAzureSql END");
            }
        }

        private bool IsAzureSql() {
            return _connectionString.ToLowerInvariant().Contains("database.windows.net");
        }
    }
}
orchardbot commented 10 years ago

@jeffolmstead commented:

I am giving your code a go - will let you know what I observe.

orchardbot commented 10 years ago

@jeffolmstead commented:

So I gave it ago and... still have the issues but my logs get more robust! Here is an issue reported I never saw before:

System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.) ---> System.ComponentModel.Win32Exception (0x80004005): The semaphore timeout period has expired

I (too) often see timeout expired but first I recall ever seeing "semaphore timeout period has expired".

Be interested to see how your testing is holding up.

orchardbot commented 10 years ago

@Piedone commented:

I experienced the same semaphore error, happening with roughly the same frequency as the one before.

orchardbot commented 10 years ago

@csurieux commented:

Ok, me too, not as frequent as you but get it. I think the ReliableSql is not working.

A better approach seem to be in this dowload which explains that the new framework 4.51 SqlClient driver introduce in itself the retry logic for SqlAzure. Implementing new keywords ConnectRetryCount and • ConnectRetryInterval : https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CCUQFjAA&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownload%2FD%2F2%2F0%2FD20E1C5F-72EA-4505-9F26-FEF9550EFD44%2FIdle%2520Connection%2520Resiliency.docx&ei=C3aRU_3oGsnW0QWlzoDQBw&usg=AFQjCNFcCs36lhY8N0Y8MCaNvpKYKQ2XnQ&sig2=rp7azFCSz_GNjennbW8RMQ&bvm=bv.68445247,d.d2k

orchardbot commented 10 years ago

@csurieux commented:

Seems that NHibernate is still using framework 4.0 ....

orchardbot commented 10 years ago

@csurieux commented:

Description the way this new features solves the problem http://tudorturcu.wordpress.com/2013/11/11/ado-net-connection-resiliency/

It explains why I never get this on my servers using direct ADO.NET with sqlclient 4.51....

orchardbot commented 10 years ago

@csurieux commented:

Seems that we could use the keywords ConnectRetryCount and ConnectRetryInterval in ConnectionString itself if NH was using the 4.51 SqlClient.

orchardbot commented 10 years ago

@csurieux commented:

One solution could be to create a dedicated NHIbernate driver using 4.51, and plug it in NH just as is doing NHibernate.SqlAzure

orchardbot commented 10 years ago

@csurieux commented:

If Orchard could build targeting 4.51 and not 4.5 a simple class as this on could implement the new SQLClient default retry strategy far better than the FaultTransient from some avanade gurus .... using System.Data; using NHibernate.AdoNet; using NHibernate.Driver; using System.Data.SqlClient;

namespace NHibernate.Sql451 { ///

/// Class to use in NH to support the new retry Strategy
/// </summary>

public class Sql451ClientDriver : Sql2008ClientDriver, IEmbeddedBatcherFactoryProvider
{
    /// <summary>

    /// Replace the NH version working with SlqConnection 4.0 which doesn't support retry
    /// </summary>

    /// <returns></returns>
    public override IDbConnection CreateConnection()
    {
        return new SqlConnection();
    }
    System.Type IEmbeddedBatcherFactoryProvider.BatcherFactoryClass
    {
        get { return typeof(SqlClientBatchingBatcherFactory); }
    }
}

}

orchardbot commented 10 years ago

@csurieux commented:

Maybe it would be better to migrate from NH to EntityFramework :)

orchardbot commented 10 years ago

@jeffolmstead commented:

I can't believe you just mentioned the "E" word! Crazy.

orchardbot commented 10 years ago

@sebastienros commented:

What we can do is use reflection to check if the class is available, and use it in that case. This way we don't require 4.5.1 but it will behave correctly if we do.

Can you try to patch something like this ?

orchardbot commented 10 years ago

@csurieux commented:

Sorry I will first test a 4.51 in production to see if it fixes the problem as I think. I have an urgent pb to solve when Azure Store calls in the middle of the (NH) night :) Have you some reason to avoid going on 4.51 ? On my test everything seems ok but I have not applied the Orchard tests, only code playing with NH queries and incoming/outgoing Web API calls. Please tell if there is a real show stopper in order I don't wast time on Orchard side.

orchardbot commented 10 years ago

@csurieux commented:

@Jeff don't you think so ?

orchardbot commented 10 years ago

@csurieux commented:

I will test in test production, fo sure :), MS providing test envs for AS

orchardbot commented 10 years ago

@csurieux commented:

Very disappointed... It doesn't clear totally the problem. After many tests and config variation I still get this semaphore exception. But these tests clearly show that there is a network pb: Orchard+NH seems to be overflowing the bandwidth with multiple transaction commits even when the transaction is empty: There should be some flag saying Is there something in the current transaction which could avoid many network calls. Problem for S what to do with an open but empty transaction? Does the transaction manager has a call to simply release it without calling the server? I don't think so. May be the idea would be to do the begin transaction only when the first item is enlisted: use a lazy begin transaction so when we need to get rid of it, if it is not instancied we don't have to call sever.... What about this ? Any idea on how to implement with this old NH fellow? Or directly in orchard?

orchardbot commented 10 years ago

@csurieux commented:

Maybe transaction manager already works with a lazy begin but orchard or NH are always enlisting something ? Would not be surprised if this is the problem, this could only be traced with sql profiler? Or is there another simple way in orchard?

orchardbot commented 10 years ago

@csurieux commented:

I finally have something working and made a pull request here https://orchard.codeplex.com/SourceControl/network/forks/CSADNT/Sql451Connection/contribution/6962

Migrated Orcgard to target 4.5.1 in order to use the new SqlConnection and its retry logic better adapted to Sql Azure than the FaultTransient library in NHibernate.SqlAzure (this last is removed). Also added a better logic (FlushTransaction) in BackgroundService in order to avoid keeping an open transaction too long.

code like this

        public void Sweep() {
            foreach(var task in _tasks) {
                try {
                    try
                    {
                        _transactionManager.RequireNew();
                    }
                    catch(Exception ex)
                    {
                        Logger.Error(ex, "Error requiring a new transaction, trying to renew the session after cancelling transaction.");
                        _transactionManager.Cancel();
                        Logger.Error("Session cancelled.");
                        _transactionManager.FlushTransaction();
                        Logger.Error("Ok after FlushTransaction.");
                        break;
                    }
                    task.Sweep();
                    _transactionManager.FlushTransaction(); // don't keep an open transaction 
                }
                catch (Exception e) {                    
                    Logger.Error(e, "Error while processing background task");
                    _transactionManager.Cancel();
                    _transactionManager.FlushTransaction(); // let's get rid of this polluted transaction
                }                
            }
        }
orchardbot commented 10 years ago

@jeffolmstead commented:

This will be amazing! I am watching this like a hawk as I am very frustrated that SQL Azure seems to be the weakest link in Orchard (even though likely not SQL Azure fault). Thanks for your ridiculous hard work on this, hope I can solve a problem for you sometime. I will try to pull your changes in and give them a go. Please keep me informed how this works out for you and I will do the same.

orchardbot commented 10 years ago

@csurieux commented:

Yea thanks (not ridiculous but not funny.... :) ) So let me know, at that time I have no more errors, and I reproduced exactly the conditions where it was crashing: generating a war of Web API calls from MS Azure Store beast (crazy engine built in India with shiva in mind)...

orchardbot commented 10 years ago

@csurieux commented:

Well... I don't know what to think about all this: I still get the problem in very high sollicitation.

And even worse I get it on a normal sql server 2012 in dev environment !!!!

I don't know if it is the same problem but debuging code and looking in SQL Server Actvity monitor in parallel, it appear that the ProcessQueue method of JobsQueueProcessor class fails on a lock when sending this request

while ((messages = _jobsQueueManager.Value.GetJobs(0, 10).ToArray()).Any())

the lock is a LCK_M_S , a share lock set by previuosly an different sql session which has not been committed. When I kill the session in sql no more problems.

I am thinking that Orchard is not closing every session, seems that some permanent objet in autofac is keeping one open, and it keeps the lock until the default duration for a session in SQL Server, which could be long.