OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.27k stars 2.35k forks source link

Prevent BackgoundTask run on multiple instances #11463

Closed Oceania2018 closed 2 years ago

Oceania2018 commented 2 years ago

The same background task will run on multiple instances, how to avoid this. Although I have set the timeout and expiration time, this does not prevent the background tasks from being triggered periodically in every single instance, it just prevents them from running at the same time.

According to @jtkech 's explanation, it also confirmed my conjecture.

After enabling Redis locking, you also need to define locking timeout and expiration parameters, by code with the BackgroundTaskAttribute, or from the admin UI if you enable the BackgroundTask module. Then the lock will only prevent the task to be run concurrently, you have to find a way to check if the job is already done or just need to be completed. We don't yet implement oob a job that should be run by only one node in a given period, hmm here, in addition to the Redis lock, could be a "Done" flag in the Redis cache with a given expiration.

Is it possible to include another open source library to support distributed cron job like https://www.hangfire.io/, it can make sure only one instance will execute the job. Another quick fix I'm thinking is we could add an Instance name in the Admin UI, so before schedule task runs, we can check if the current instance == settings.Instance.

jtkech commented 2 years ago

Yes there are many scenarios, for now we only provide atomicity, for example writing to a database is atomic because of the nature of the transaction, but you may first need to read the database to know what you have to do, so using a distributed lock provide atomicity for the whole. Notice that you have to set the locking parameters otherwise the default values is like you don't have any locking.

Then a given node needs to know what it has to do, for example when executing our Indexing Task (by managing an index cursor) and our PublishLater Task (by checking dates and setting them to null when the job is done) a given node knows what it has to do, so here we only need to provide atomicity as described above.

Is it a custom IBackgroundTask that you defined? If so you could manage a simple ownership by setting a dedicated entry in the distributed cache, e.g. the process Id of the first node that acquire the lock to execute your task.

So, when a node execute the Task (meaning it acquired the lock) you would just have to check the ownership, if null just set it and do the job, if not null but not equal to your process Id don't do anything, and so on. Finally, don't forget to specify an expiration when setting the ownership, this way the ownership may change from one node to another one, but mainly for when a node having the ownership is stopped.

yusuf-kayikci commented 2 years ago

Hi again I try to do solution which you mention above. Could you check it if you have time ? @jtkech

 `public class BannerDataTransferTask : IBackgroundTask
  {        

    private static string GetDistributedOwnershipKey(IRedisService redisService) => 
       $"{redisService.InstancePrefix}{nameof(BannerDataTransferTask)}-Ownership";

    private static string GeInstanceHostName() => Dns.GetHostName();

    private static bool IsOwnerInstance(IRedisService redisService)
    {
        var key = GetDistributedOwnershipKey(redisService);

        var ownerInstanceIdentifier = redisService.Database.StringGet(key);

        var instanceIdentifier = GetInstanceHostName();

        if (!string.IsNullOrEmpty(ownerInstanceIdentifier))
        {
            return ownerInstanceIdentifier == instanceIdentifier;
        }

       //  I set 5 minute for expiration as you said --Finally, don't forget to specify an expiration when setting the ownership
        return redisService.Database.StringSet(key, instanceIdentifier, TimeSpan.FromMinutes(5));
    }

    public async Task DoWorkAsync(IServiceProvider serviceProvider, CancellationToken cancellationToken)
    {
        var redisService = serviceProvider.GetRequiredService<IRedisService>();

        if (redisService != null)
        {
            if (!IsOwnerInstance(redisService))
            {
               // if instance is not owner don't do anything
                return;
            }
        }

        // if is owner instance continue to do task.
        DoTask();

    }
}`
jtkech commented 2 years ago

@yusuf-kayikci

Good start but it can be simpler.

So I suggest that for your BackgroundTask you define the LockTimeout and LockExpiration both > 0, so that the atomicity is done for you. LockExpiration should be >> than the job time and < 1 min the time granularity. The LockTimeout (max time to acquire the lock) can be << than the job time, so that the 1st time an instance acquire the lock and then the ownership, other waiting instances will not wait for all the job time. But not a too low LockTimeout, so that when the ownership is already set, if multiple instances are waiting to acquire the lock including the instance that has the ownership, this last one has a good chance to acquire the lock (before the timeout) and do the job, this knowing that other instances will just see that they don't have the ownership and return quite quickly by releasing the lock.

Sorry if I'm not clear ;)

That said, if you don't care if 2 instances can do the job when the ownership is not yet set or expired, you may forget what I said about the distributed lock. But at least as said above, to set/get the ownership identifier you just have to resolve the oob aspnetcore IDistributedCache.

yusuf-kayikci commented 2 years ago

Firstly thanks much for your suggestions @jtkech

1 - I use IDistributedCache instead of IRedisService 2- we use the host name and the process id, but if your instances are not on the same server the host name is sufficient. therefore I changed instance identifier definition $"{Dns.GetHostName()}:{Process.GetCurrentProcess().Id}" like in source code 3 - BackgroundTask works in about 1 seconds so I set locktimeout=100ms and lockexpiration=5000ms from advanced settings on ui and you can see code that with these changes

 `public class BannerDataTransferTask : IBackgroundTask
  {        

    private static string GetDistributedOwnershipKey() => $"cms{nameof(BannerDataTransferTask)}-Ownership";

    private static string GeInstanceIdentifier() => $"{Dns.GetHostName()}:{Process.GetCurrentProcess().Id}";

    private static bool IsOwnerInstance(IDistributedCache distributedCache)
    {
        var key = GetDistributedOwnershipKey();

        var ownerInstanceIdentifier = distributedCache.GetString(key);

        var instanceIdentifier = GeInstanceIdentifier();

        if (string.IsNullOrEmpty(ownerInstanceIdentifier))
        {
            return ownerInstanceIdentifier == instanceIdentifier;
        }
        // I set 5 minute for expiration as you said --Finally, don't forget to specify an expiration when setting the ownership
        distributedCache.SetString(key, instanceIdentifier, new DistributedCacheEntryOptions
        {
            AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(5)
        });

        return true;
    }

    public async Task DoWorkAsync(IServiceProvider serviceProvider, CancellationToken cancellationToken)
    {
        var distributedCache = serviceProvider.GetRequiredService<IDistributedCache>();

        if (distributedCache != null)
        {
            if (!IsOwnerInstance(distributedCache))
            {
                return;
            }
        }

        // if is owner instance continue to do task.
        DoTask();

    }
}`
jtkech commented 2 years ago

@yusuf-kayikci

Yes looks good, only minor things to update!

Suggested change

using System;
using System.Net;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.DependencyInjection;
using OrchardCore.BackgroundTasks;

namespace Your.Mamespace;

public class BannerDataTransferTask : IBackgroundTask
{
    // Can just be a const.
    private const string DistributedOwnershipKey = $"cms{nameof(BannerDataTransferTask)}-Ownership";

    // If you use the 'async' modifier, below you should use at least one 'await' to call a 'SomeAsync()'
    // method, if you don't use 'async' you should return 'Task.CompletedTask' or 'Task.FromResult(...)'.
    public async Task DoWorkAsync(IServiceProvider serviceProvider, CancellationToken cancellationToken)
    {
        // No need to do a null check on 'distributedCache', when there is no distributed cache, a default
        // one using in memory cache is resolved, anyway 'GetRequiredService()' would have already thrown.

        var distributedCache = serviceProvider.GetRequiredService<IDistributedCache>();
        if (!await IsOwnerInstanceAsync(distributedCache))
        {
            return;
        }

        await DoTaskAsync();
    }

    // Just for the example.
    private static Task DoTaskAsync() => Task.CompletedTask;

    private static string GeInstanceIdentifier() => $"{Dns.GetHostName()}:{Environment.ProcessId}";

    private static async Task<bool> IsOwnerInstanceAsync(IDistributedCache distributedCache)
    {
        var instanceIdentifier = GeInstanceIdentifier();

        var ownerInstanceIdentifier = await distributedCache.GetStringAsync(DistributedOwnershipKey);
        if (!String.IsNullOrEmpty(ownerInstanceIdentifier))
        {
            return ownerInstanceIdentifier == instanceIdentifier;
        }

        distributedCache.SetString(DistributedOwnershipKey, instanceIdentifier, new DistributedCacheEntryOptions
        {
            AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(5)
        });

        return true;
    }
}
Piedone commented 1 year ago

Shouldn't the concurrent execution of bg tasks be prevented generally, for every bg task, from a feature depending on Redis Lock? Why did you close this, @sebastienros?

jtkech commented 1 year ago

For example with aspnetcore we can inject IDistributedCache that by default is an in memory cache but with the same interface: MemoryDistributedCache. So when you inject IDistributedCache it means that you are potentially distributed, but yes to be really distributed you need to register a real distributed cache.

We followed the same pattern for IDistributedLock which by default is only an ILocalLock.

Each background task already use a lock, in TryAcquireBackgroundTaskLockAsync() we can see that if the current IDistributedLock is just an ILocalLock we do nothing as we assume that we are in a single instance environment.

So backgroundTasks already are locked, but yes it relies on the current IDistributedLock registration which is the same pattern used for IDistributedCache. For example our DocumentManager and DynamicCache become automatically distributed when enabling e.g. Redis.Cache.

That said, when e.g. the Redis.Lock is enabled, to get a working Lock you need to provide Lock.Timeout and Lock.Expiration values > 0, otherwise it's like having no lock. So maybe what we are really missing is to provide pertinent values when we define bgTasks with the [BackGroundTask] attribute, this knowing that these values can be tweaked through the Admin if you enable the OC.BackgroundTask feature.

Piedone commented 1 year ago

I completely missed TryAcquireBackgroundTaskLockAsync(), thank you for explaining. And now I also see the Lock* properties on BackgroundTaskAttribute: It's really perfect, since if you want to prevent concurrency then you can set them, and if it doesn't matter (or downright your bg task needs to run on every node) then you just don't set them. So, there doesn't seem anything missing for me.

jtkech commented 1 year ago

Okay cool then