NCronJob-Dev / NCronJob

A Job Scheduler sitting on top of IHostedService in dotnet.
https://docs.ncronjob.dev/
MIT License
158 stars 11 forks source link

[MaybeBug/MaybeDoc] Unclear expectations about dependent jobs #108

Open nulltoken opened 3 weeks ago

nulltoken commented 3 weeks ago

Describe the bug Running the same job on different schedules, each schedule having different sub dependent jobs doesn't seem to work.

I've tried to expose this through a test that seems to put the "issue/expectation" under the light.

⚠️ I've been fighting a lot with the integration test infrastructure (ChannelWriter , ...). And I often lost. My understanding of it is very light. So that test may be badly written or failing for the wrong reasons.

I've tried to model something that may exist in real life: Regular internal reporting vs longer delayed external reporting.

As a side note, I've haven't been able to model a test simulating a daily vs a monthly execution and would be interested to know more about how to achieve this.

[Fact]
public async Task Hrmpf()
{
    string constantly = "* * * * *";
    string monthly = "0 0 1 * *";

    FakeTimer.SetUtcNow(DateTimeOffset.Parse("2024-10-25T16:55Z", CultureInfo.InvariantCulture));

    var storage = new DepStorage();
    ServiceCollection.AddSingleton(storage);
    ServiceCollection.AddNCronJob(n =>
    {
        n.AddJob<AnalysisJob>(b => b.WithCronExpression(constantly))
            .ExecuteWhen(success: s => s.RunJob<ReportToProductOwnerJob>());

        n.AddJob<AnalysisJob>(b => b.WithCronExpression(monthly))
            .ExecuteWhen(success: s => s.RunJob<ReportToStakeholdersJob>());
    });

    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);

    FakeTimer.Advance(TimeSpan.FromMinutes(2));

    await Task.WhenAll(GetCompletionJobs(2));
    string.Join(' ', storage.InvokedJobs).ShouldNotContain(nameof(ReportToStakeholdersJob));
}

private sealed class DepStorage
{
    public ConcurrentBag<string> InvokedJobs { get; } = [];
}

private sealed class AnalysisJob(ChannelWriter<object> writer, DepStorage storage) : IJob
{
    public async Task RunAsync(IJobExecutionContext context, CancellationToken token)
    {
        storage.InvokedJobs.Add(context.JobType!.Name);
        await writer.WriteAsync(true, token);
    }
}

private sealed class ReportToProductOwnerJob(ChannelWriter<object> writer, DepStorage storage) : IJob
{
    public async Task RunAsync(IJobExecutionContext context, CancellationToken token)
    {
        storage.InvokedJobs.Add(context.JobType!.Name);
        await writer.WriteAsync(true, token);
    }
}

private sealed class ReportToStakeholdersJob(ChannelWriter<object> writer, DepStorage storage) : IJob
{
    public async Task RunAsync(IJobExecutionContext context, CancellationToken token)
    {
        storage.InvokedJobs.Add(context.JobType!.Name);
        await writer.WriteAsync(true, token);
    }
}

Current behavior ReportToProductOwnerJob is invoked every minute. ReportToStakeholdersJob is invoked every minute.

Expected behavior I can think of 3 options (from "best" to "worse" from the user standpoint):

  1. Make it work

    • ReportToProductOwnerJob is invoked every minute.
    • ReportToStakeholdersJob is invoked every month.
  2. Prevent it from happening

    • Make AddNCronJob() throw a NotSupportedException bearing an explicit message when this kind of usage is detected
  3. Document that it's not currently supported

    • Add a "Known gotchas" section in the documentation
    • Propose an alternative design that would allow this kind of use case

Although "Make it work" would obviously be the preferred option, my current endeavor in #106 leads me to think that this might not be a one-liner fix.

Version information

linkdotnet commented 3 weeks ago

That is an interesting case and for sure a bug! Your first given option should be what is happening. That is what the user configured.

I understand the test-setup is rather complex. The issue at the moment is, that we don't have any means of looking inside the state (when in a test). Therefore, the channel writes something, and the tests listen. So that we know for sure, something was triggered.

linkdotnet commented 3 weeks ago

I guess I do have an idea for fix: Your example can even be more convoluted:

Services.AddNCronJob(s => s.AddJob<...>(p => p.WithCronExpression(...).WithCronExpression(...)).ExecuteWhen()

Here I want to execute the job for both cron expressions

linkdotnet commented 3 weeks ago

Plus:

Servuces.AddNCronJob(s => s.AddJob<..>().ExecuteWhen();

It should also work. So if the user doesn't define any parameters, it should work either way (think of Instant Jobs),

nulltoken commented 3 weeks ago

I guess I do have an idea for fix:

I was also toying with something on my side.

We currently link by Job type.

Now that all the interesting pieces are more or less in JobRegistry, we could link by JobDefinition instance (and use it as the key to the dictionary of DependentJobRegistryEntrys.

linkdotnet commented 3 weeks ago

I currently have a working version that does exactly that. But that leads to the following failing test:

[Fact]
public async Task WhenJobWasSuccessful_DependentAnonymousJobShouldRun()
{
    Func<ChannelWriter<object>, JobExecutionContext, Task> execution = async (writer, context) => await writer.WriteAsync($"Parent: {context.ParentOutput}");
    ServiceCollection.AddNCronJob(n => n.AddJob<PrincipalJob>()
        .ExecuteWhen(success: s => s.RunJob(execution)));
    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);

    provider.GetRequiredService<IInstantJobRegistry>().ForceRunInstantJob<PrincipalJob>(true);

    var result = await CommunicationChannel.Reader.ReadAsync(CancellationToken) as string;
    result.ShouldBe("Parent: Success");
}

Basically: A job that does get triggered directly. Still, at least that is my assumption, dependent jobs, should be executed. Of course we could also argue, that isn't part of the deal - but that would be a breaking change and I would like to keep the old behavior.

linkdotnet commented 3 weeks ago

By the way, a very simple test in regards to your original message might be:

[Fact]
public async Task ConfiguringDifferentDependentJobsForSchedulesShouldResultInIndependentRuns()
{
    ServiceCollection.AddNCronJob(n =>
    {
        n.AddJob<PrincipalJob>(s => s.WithCronExpression("1 0 1 * *").WithParameter(true))
            .ExecuteWhen(s => s.RunJob((ChannelWriter<object> writer) => writer.WriteAsync("1").AsTask()));
        n.AddJob<PrincipalJob>(s => s.WithCronExpression("1 0 2 * *").WithParameter(true))
            .ExecuteWhen(s => s.RunJob((ChannelWriter<object> writer) => writer.WriteAsync("2").AsTask()));
    });
    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);
    using var timeoutToken = new CancellationTokenSource(3000);
    using var linkedToken = CancellationTokenSource.CreateLinkedTokenSource(timeoutToken.Token, CancellationToken);

    FakeTimer.Advance(TimeSpan.FromMinutes(1));

    var content = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token);
    content.ShouldBe("1");

    FakeTimer.Advance(TimeSpan.FromDays(1));
    content = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token);
    content.ShouldBe("2");
}
linkdotnet commented 3 weeks ago

Down to one failing test

linkdotnet commented 3 weeks ago

This one -.-

[Fact]
public async Task CanBuildAChainOfDependentJobs()
{
    ServiceCollection.AddNCronJob(n =>
    {
        n.AddJob<PrincipalJob>().ExecuteWhen(success: s => s.RunJob<DependentJob>());
        n.AddJob<DependentJob>().ExecuteWhen(success: s => s.RunJob<DependentDependentJob>());
    });
    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);

    provider.GetRequiredService<IInstantJobRegistry>().ForceRunInstantJob<PrincipalJob>(true);

    using var timeoutToken = new CancellationTokenSource(2000);
    using var linkedToken = CancellationTokenSource.CreateLinkedTokenSource(timeoutToken.Token, CancellationToken);
    var result = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token) as string;
    result.ShouldBe("Me:  Parent: Success");
    FakeTimer.Advance(TimeSpan.FromMinutes(1));
    result = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token) as string;
    result.ShouldBe("Dependent job did run");
}

But I could move some stuff out of the JobRegistry thanks to your work

nulltoken commented 3 weeks ago

Basically: A job that does get triggered directly. Still, at least that is my assumption, dependent jobs, should be executed. Of course we could also argue, that isn't part of the deal - but that would be a breaking change and I would like to keep the old behavior.

⚠️ Thinking out loud

I'm starting to wonder whether Job might be a too generic concept with regards to this library.

n.AddJob<AnalysisJob>(b => b.WithCronExpression(constantly))
      .ExecuteWhen(success: s => s.RunJob<ReportToProductOwnerJob>());

n.AddJob<AnalysisJob>(b => b.WithCronExpression(monthly))
      .ExecuteWhen(success: s => s.RunJob<ReportToStakeholdersJob>());

In the example above, we've got one Job and two schedules. That generates two JobDefinitions (an internal concept that the user knows nothing about). Each of them linking to other Jobs...

2024-10-29 20_03_58-Woody And Buzz Lightyear Everywhere Meme Template — Kapwing — Mozilla Firefox

When triggering provider.GetRequiredService<IInstantJobRegistry>().ForceRunInstantJob<AnalysisJob>();, which JobDefinition should we actually activate?

Should we blindly trigger one of them, all of them? Should we prevent the user from triggering this when there's an ambiguity? Would we need a way (naming?) to let the user disambiguate those two convoys?

Untitled

:wink:

linkdotnet commented 3 weeks ago

Yes - I am running down the same rabbit hole! There are plenty of similar questions in my head (mainly all of them very exceptional / edge cases):

ServiceCollection.AddNCronJob(n =>
{
    n.AddJob<PrincipalJob>().ExecuteWhen(success: s => s.RunJob<DependentJob>("PARAM"));
    n.AddJob<DependentJob>(p => p.WithCronExpression("*/2 * * * *")).ExecuteWhen(success: s => s.RunJob<DependentDependentJob>());
    n.AddJob<DependentJob>(p => p.WithCronExpression("* * * * *")).ExecuteWhen(success: s => s.RunJob<AnotherJob>());
});

If I run PrincipalJob - what jobs are triggered here? Only DependentJob? And after that DependentDependentJob? Or not?

If I just add a n.AddJob<DependentJob>().ExecuteWhen(s => s.RunJob<TotallyDifferent>)()... then for sure this one, not? :D

The approach via typeof was simple and nice :D

linkdotnet commented 3 weeks ago

The simplest way to cover almost all, is to allow a standalone ExecuteWhen that takes a job name! Job names have to be unique - but they wouldn't cover instant jobs for example.

nulltoken commented 3 weeks ago

Considering most of those are edge cases and as of now everything lives in the JobRegistry, there could be a path where the lib would analyze the trees of potential executions and issue some warnings when ambiguities are identified.

With regards to instant jobs, as their definition live outside of AddNCronJob, there's little the analyzer could do about them.

However, a breaking change in the API may potentially restrict what a InstantJob execution could accept and (maybe) help resolve this.

This whole thing somehow reminds me of the constraints of a DI container...

When one register two services through their interfaces ("A job with different cron expressions or twice the same job with different ExecuteWhen"), how does a container react when it's supposed to instantiate a type that only accepts an interface as a constructor parameter? I believe this somehow to answer that question that dotnet came around with the keyed services (hence the "named convoy" thingie above).

(Please keep in mind that my knowledge of the lib and all of its use cases is very superficial. So please consider everything I write with a grain of salt)

linkdotnet commented 3 weeks ago

Good points overall - I am still struggling what is the most intuitive way for people. What are they really expecting? How would chaining work? ...

But maybe keyed services might help here. Thanks for raising that issue. It kept me thinking alot

nulltoken commented 3 weeks ago

FWIW, it's not a "production" issue on my side. Just one thing I've stumbled upon while working on #106.

Once JobRegistry was assembled, this typed based approach downside somehow raised its head and drove me in "whatifs" scenarios.

Knowing a bit more of the inner workings, would I need to achieve this kind of model, I'd know how to temporarily walk around this current dark corner.

So, no pressure :wink:

linkdotnet commented 3 weeks ago

I love that you make your thoughts for yourself and basically gift us your precious time! Really appreciated.

I really have to think about the issue and/or if I just document this current behavior at the very least.

nulltoken commented 2 weeks ago

Reopened as only partially addressed through #124

nulltoken commented 2 weeks ago

A follow up on https://github.com/NCronJob-Dev/NCronJob/issues/128#issuecomment-2453571205 which belongs more to this issue:

The context of dependent jobs should not be considered in the scope of AddNCronJob() only, but also with the RuntimeRegistry in mind. Some additional corner cases may pop up in that light.