Open peterkeating opened 7 years ago
How many instances do you have on Azure?
@sebastienros It's setup with a single instance.
Do you have "Always On" enabled?
@sebastienros No, "Always On" is set to "Off".
Try that then, that could explain
Will do, will let you know how we get on :)
@sebastienros Picking this up as @peterkeating is off for the weekend. Unfortunately, even with setting Always On set to On, the error still happens. Just to clarify, we're not certain this is specifically an Azure issue as we don't currently have any multi-tenant sites not on Azure.
Will try and see if I can get someone to recreate on a locally running version but if you have any other suggestions it would be appreciated.
One other observation is that the action that triggers the error state does actually complete successfully. So if you enable a module, that module will be enabled when you restart the site. Likewise, if you run an import the data will have successfully imported. So whatever the issue is, it's happening after that action.
Possibly worth noting that as far as I can tell, it's only those three actions (enable module/run recipe/import) that causes the problem. Creating/editing content still works, so if I leave it for x hours then create a content item it's fine, but if I then enable a module, it breaks.
All these three actions potentially restart the tenant, which means it will reload all the services. Looks like something is holding on an old ShellContext and uses a disposed tenant. The fact it happens only if the site is left alone for hours suggests that the site got stopped (why I asked for Always On).
The error has occurred in my local development environment too now. The initial SQL connection error occurs in this method:
We've had to use the workaround described in #6305 to prevent the site from being in a permanent error state, that is working.
I am fine with this workaround, please create a PR
@sebastienros this is an improvement and we'll submit a PR but just to clarify, the initial exception still occurs with this. However, when the error happens the service gets restarted so it's not in a permanent error state (i.e. works after a page reload), so you don't need to manually restart the site to get back to a functioning state.
Same problem here. We are running instances of Orchard with 185 tenants and it works flawlessly on 1.9.1. Everytime I try to update to 1.10.2 and open like 30 tenants at once it crash with the same problem exposed by the op (Running on SQLServer)
[InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.] System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource
1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) +1129 System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource
1 retry, DbConnectionOptions userOptions) +143 System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions) +22 System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource
1 retry) +139 System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry) +367 System.Data.SqlClient.SqlConnection.Open() +130 NHibernate.Connection.DriverConnectionProvider.GetConnection() +97 NHibernate.AdoNet.ConnectionManager.GetConnection() +43 NHibernate.Impl.SessionImpl.get_Connection() +15 NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) +405[TransactionException: Begin failed with SQL exception] NHibernate.Transaction.AdoTransaction.Begin(IsolationLevel isolationLevel) +510 NHibernate.Impl.SessionImpl.BeginTransaction(IsolationLevel isolationLevel) +327 Orchard.Data.TransactionManager.EnsureSession(IsolationLevel level) +155 Orchard.Data.TransactionManager.GetSession() +17 Orchard.ContentManagement.DefaultContentManager.GetManyImplementation(QueryHints hints, Action
2 predicate) +44 Orchard.ContentManagement.DefaultContentManager.Get(Int32 id, VersionOptions options, QueryHints hints) +897 Orchard.ContentManagement.DefaultContentManager.Get(Int32 id, VersionOptions options) +20 Orchard.ContentManagement.ContentGetExtensions.Get(IContentManager manager, Int32 id, VersionOptions options) +19 Orchard.Core.Settings.Services.SiteService.GetSiteSettings() +96 Orchard.Settings.CurrentSiteWorkContext.Get(String name) +74 Orchard.Environment.<>c__DisplayClass9_0
1.b0(Lazy 1 wcsp) +61 System.Linq.WhereSelectArrayIterator
2.MoveNext() +69 System.Linq.Enumerable.FirstOrDefault(IEnumerable1 source, Func
2 predicate) +179 Orchard.Environment.WorkContextImplementation.FindResolverForState(String name) +394 System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func
2 valueFactory) +64 Orchard.Environment.WorkContextImplementation.GetState(String name) +115 Orchard.WorkContext.get_CurrentSite() +33 Orchard.SecureSocketsLayer.Services.<b__6_1>d.MoveNext() +61 System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) +137 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +65 Orchard.Mvc.Routes. 5.MoveNext() +468 System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) +137 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) +65 System.Web.TaskAsyncHelper.EndTask(IAsyncResult ar) +64 System.Web.HttpTaskAsyncHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +12 System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +577 System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +157d
@chusothe41 have you tried the workaround that is mentioned ?
We have had this difficult to reproduce issue that we've noticed sporadically on a number of multi-tenant websites hosted within Azure (VM & Web app). Based on some searches, there's a few who've experienced the issue (#2409, #6305) but because it's difficult to replicate has largely been put down to likely being a problem with a module or configuration. So we've been investigating.
Steps to reproduce:
Initially the Orchard site will respond with the error below:
After this the site is in a constant error state (shown below) but restarting the site will fix it until you leave it again.
This issue will occur on the main site and tenant but, if you cause the problem on the tenant, you can suspend/restart the tenant via the main site and that will solve it. If it happens on the main site you need to restart the site (which will also fix the tenant).