OrchardCMS / Orchard

Orchard is a free, open source, community-focused Content Management System built on the ASP.NET MVC platform.
https://orchardproject.net
BSD 3-Clause "New" or "Revised" License
2.38k stars 1.12k forks source link

Site error when running Multi Tenancy #7875

Open peterkeating opened 7 years ago

peterkeating commented 7 years ago

We have had this difficult to reproduce issue that we've noticed sporadically on a number of multi-tenant websites hosted within Azure (VM & Web app). Based on some searches, there's a few who've experienced the issue (#2409, #6305) but because it's difficult to replicate has largely been put down to likely being a problem with a module or configuration. So we've been investigating.

Steps to reproduce:

Initially the Orchard site will respond with the error below:

2017-10-05 08:48:29,191 [19] NHibernate.Transaction.AdoTransaction - (null) - Commit failed [(null)]
System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) ---> System.ComponentModel.Win32Exception (0x80004005): An existing connection was forcibly closed by the remote host
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.SNIWritePacket(SNIHandle handle, SNIPacket packet, UInt32& sniError, Boolean canAccumulate, Boolean callerHasConnectionLock)
   at System.Data.SqlClient.TdsParserStateObject.WriteSni(Boolean canAccumulate)
   at System.Data.SqlClient.TdsParserStateObject.WritePacket(Byte flushMode, Boolean canAccumulate)
   at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalTransaction.Commit

After this the site is in a constant error state (shown below) but restarting the site will fix it until you leave it again.

chrome_2017-10-05_15-39-17

This issue will occur on the main site and tenant but, if you cause the problem on the tenant, you can suspend/restart the tenant via the main site and that will solve it. If it happens on the main site you need to restart the site (which will also fix the tenant).

sebastienros commented 7 years ago

How many instances do you have on Azure?

peterkeating commented 7 years ago

@sebastienros It's setup with a single instance.

sebastienros commented 7 years ago

Do you have "Always On" enabled?

peterkeating commented 7 years ago

@sebastienros No, "Always On" is set to "Off".

sebastienros commented 7 years ago

Try that then, that could explain

peterkeating commented 7 years ago

Will do, will let you know how we get on :)

DannyT commented 7 years ago

@sebastienros Picking this up as @peterkeating is off for the weekend. Unfortunately, even with setting Always On set to On, the error still happens. Just to clarify, we're not certain this is specifically an Azure issue as we don't currently have any multi-tenant sites not on Azure.

Will try and see if I can get someone to recreate on a locally running version but if you have any other suggestions it would be appreciated.

One other observation is that the action that triggers the error state does actually complete successfully. So if you enable a module, that module will be enabled when you restart the site. Likewise, if you run an import the data will have successfully imported. So whatever the issue is, it's happening after that action.

DannyT commented 7 years ago

Possibly worth noting that as far as I can tell, it's only those three actions (enable module/run recipe/import) that causes the problem. Creating/editing content still works, so if I leave it for x hours then create a content item it's fine, but if I then enable a module, it breaks.

sebastienros commented 7 years ago

All these three actions potentially restart the tenant, which means it will reload all the services. Looks like something is holding on an old ShellContext and uses a disposed tenant. The fact it happens only if the site is left alone for hours suggests that the site got stopped (why I asked for Always On).

peterkeating commented 7 years ago

The error has occurred in my local development environment too now. The initial SQL connection error occurs in this method:

https://github.com/OrchardCMS/Orchard/blob/6720b71cf3474a9a7b8a8cc9a99d58b1e733acfa/src/Orchard/Data/SessionLocator.cs#L89

We've had to use the workaround described in #6305 to prevent the site from being in a permanent error state, that is working.

sebastienros commented 7 years ago

I am fine with this workaround, please create a PR

DannyT commented 7 years ago

@sebastienros this is an improvement and we'll submit a PR but just to clarify, the initial exception still occurs with this. However, when the error happens the service gets restarted so it's not in a permanent error state (i.e. works after a page reload), so you don't need to manually restart the site to get back to a functioning state.

chusothe41 commented 6 years ago

Same problem here. We are running instances of Orchard with 185 tenants and it works flawlessly on 1.9.1. Everytime I try to update to 1.10.2 and open like 30 tenants at once it crash with the same problem exposed by the op (Running on SQLServer)

[InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.] System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) +1129 System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions) +143 System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions) +22 System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource1 retry) +139 System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry) +367 System.Data.SqlClient.SqlConnection.Open() +130 NHibernate.Connection.DriverConnectionProvider.GetConnection() +97 NHibernate.AdoNet.ConnectionManager.GetConnection() +43 NHibernate.Impl.SessionImpl.get_Connection() +15 NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) +405

[TransactionException: Begin failed with SQL exception] NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) +510 NHibernate.Impl.SessionImpl.BeginTransaction(IsolationLevel isolationLevel) +327 Orchard.Data.TransactionManager.EnsureSession(IsolationLevel level) +155 Orchard.Data.TransactionManager.GetSession() +17 Orchard.ContentManagement.DefaultContentManager.GetManyImplementation(QueryHints hints, Action2 predicate) +44 Orchard.ContentManagement.DefaultContentManager.Get(Int32 id, VersionOptions options, QueryHints hints) +897 Orchard.ContentManagement.DefaultContentManager.Get(Int32 id, VersionOptions options) +20 Orchard.ContentManagement.ContentGetExtensions.Get(IContentManager manager, Int32 id, VersionOptions options) +19 Orchard.Core.Settings.Services.SiteService.GetSiteSettings() +96 Orchard.Settings.CurrentSiteWorkContext.Get(String name) +74 Orchard.Environment.<>c__DisplayClass9_01.b0(Lazy1 wcsp) +61 System.Linq.WhereSelectArrayIterator2.MoveNext() +69 System.Linq.Enumerable.FirstOrDefault(IEnumerable1 source, Func2 predicate) +179 Orchard.Environment.WorkContextImplementation.FindResolverForState(String name) +394 System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func2 valueFactory) +64 Orchard.Environment.WorkContextImplementation.GetState(String name) +115 Orchard.WorkContext.get_CurrentSite() +33 Orchard.SecureSocketsLayer.Services.<b__6_1>d.MoveNext() +61 System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) +137 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +65 Orchard.Mvc.Routes.d5.MoveNext() +468 System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) +137 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +65 System.Web.TaskAsyncHelper.EndTask(IAsyncResult ar) +64 System.Web.HttpTaskAsyncHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +12 System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +577 System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +157

sebastienros commented 6 years ago

@chusothe41 have you tried the workaround that is mentioned ?