Prevent migrations from executing on multiple nodes in a web farm

sfmskywalker commented 9 years ago

In web farm setups, data migrations should execute on just one node. This could be implemented using a simple lease implementation leveraging MessageBus and Redis, where the lease would be a key in CacheManager and invalidated using a distributed signal (MessageBus + Redis) and maybe a timeout as a fallback (IClock.When).

pszmyd commented 9 years ago

Don't we already have a similar mechanism (TaskLease) that could be utilized here? Or rewritten/updated to use Redis instead of the database? Anyways - I strongly agree that such a read-to-use "distributed lock" component would be very useful - not only for migrations though.

sfmskywalker commented 9 years ago

My thought as well initially, but @sebastienros correctly pointed out, data migrations services live in Orchard.Framework while ITaskLeaseService lives in its own module. Also the implementation for this could be very simple where the data migration manager simply leverages ICacheManager to create a "lock" and releases it when its done. All we'd have to do next is enable distributed signals. This way it's also reusable for things other than migrations as you suggest.

DaRosenberg commented 9 years ago

I don't see how ICacheManager fits into the picture. You can't use that to create a lock unless it's distributed, the features for which also live in separate modules. And I don't see the point - TaskLease simply uses a database record, couldn't the migration manager do the same?

sfmskywalker commented 9 years ago

I suppose the Migration Manager could use database records too, but it can't use ITaskLeaseService for that since that interface does not live in Orchard.Framework while the data migration manager does. ICacheManager does live in Orchard.Framework so it could potentially be used (it's OK that the distributed features live in separate modules). But @DanielStolt makes a good point - how does creating a key in local memory allow you to use it as a lock that other nodes can monitor, since the key is local? Seems to me the key must be stored in a shared cache. If we would go the Redis cache route, I imagine the server nodes would see if the cache contains a certain key. If not, it creates it and therefore has the lease. Other nodes checking for that key would know that a lease was given to someone else so they would not execute the migrations. To @DanielStolt 's point: why use Redis over a simple database record? Or perhaps the framework itself should not provide the implementation, just the contract. Which seems to be the point of ITaskLeaseService, so perhaps we should bring that thing into the framework.

DaRosenberg commented 9 years ago

Yeah, exactly - I didn't mean to use ITaskLeaseService I just meant to do what it does, i.e. use a database record for cross-farm critical section. IMO it's by far the simplest and most robust solution if all you want to do is lock. The database provides very robust ACID properties, a distributed cache solution does not.

pszmyd commented 9 years ago

Me neither - it was just an example. I'd rather go with some simple "lock provider service" that would sit in Orchard.Framework, with some simple implementation based on database. I totally agree that it's the simplest and the most robust solution. And maybe TaskLeaseService could be then refactored to use the new mechanism.

The only thing I'm concerned about is using database as a lock mechanism when Migration Manager runs DDL stuff.

sfmskywalker commented 9 years ago

DDL?

sfmskywalker commented 9 years ago

So maybe we could copy just the ITaskLeaseService interface to Orchard.Framework, which itself would provide an empty implementation. The actual implementation would still come from Orchard.TaskLease. This has the benefit that installations that don't require task lease won't have the TaskLeaseRecord table created and better matches Orchard's overall architectural design. I'll create a branch with some suggested changes as well as a data migration manager leveraging task lease which we can review.

Piedone commented 9 years ago

TaskLease and its implementation is not useful in scenarios where a lease/lock shouldn't span multiple transactions (and even then its limited in use to scenarios where tasks won't try to start within the duration of the transaction acquiring the lease), see: https://github.com/OrchardCMS/Orchard/issues/4313 and https://github.com/OrchardCMS/Orchard/issues/3468.

Orchard.Caching (because it can supply a distributed implemenation) can be used for such server farm-wide locking (see a decorator for TaskLease): http://helpfullibraries.codeplex.com/SourceControl/latest#Libraries/Tasks/TaskLeaseServiceDecorator.cs As can Media files used for the same purpose: http://helpfullibraries.codeplex.com/SourceControl/latest#Libraries/Tasks/Locking/FileLock.cs

Since neither of these are (rightfully) in the Core I'd suggest adding a distributed/shared locking feature into the Framework with an implementation that's good enough for a web garden scenario (e.g. by using lock files in App_Data) and e.g. Orchard.Redis (or Orchard.Caching on a higher level) and Orchard.MessageBus could supply implementations usable in a web farm setup. The default implementation could even use IStorageProvider in the same way the linked services in Helpful Libraries does since that's a Framework interface and Media is the first thing you should think about when running on multiple servers (though, depending on the solution, it's not always usable for locking, but in this case you could use one of the other implementations).

sebastienros commented 9 years ago

The best solution here is to use a table to lock the migrations.

Piedone commented 9 years ago

And use a separate transaction for handling that lock? Or ReadUncommitted isolation level?

pszmyd commented 9 years ago

To minimize the impact of row/table locking and other issues that may come due to running DDL and DML queries in a single transaction, I'd go for a separate transaction in ReadUncommitted isolation level, yes (if SQL Server would've supported it, I'd go Chaos level - always makes me laugh when I see it;).

DaRosenberg commented 9 years ago

I think you'll have to use ReadCommitted or higher for this to have the desired effect. ReadUncommitted won't suffice, because you're looking to mitigate the potential race condition that two or more nodes try to lock at exactly the same time. If you use a dedicated transaction for only the synchronization and a dedicated table for this and keep clear of any foreign key relationships etc, I don't think it will cause interfering locking issues with other stuff.

DaRosenberg commented 9 years ago

Here's an interesting Redis-based implementation we might steal: https://github.com/schyntax/cs-schtick.redis

OrchardCMS / Orchard

Prevent migrations from executing on multiple nodes in a web farm #5621