dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.04k stars 2.02k forks source link

Reminders V2: Improve Reminder Implementation Semantics #7573

Open ElanHasson opened 2 years ago

ElanHasson commented 2 years ago

This is a DRAFT

Summary

This proposal is a high-level design for Reminders V2 that will allow the utmost flexibility in Reminders.

Reminder Middleware

The concept of Reminder Middleware allows users to insert functionality into the Reminder processing pipeline. The behavior would mimic ASP.NET Core's Middleware pipeline.

Reminders would have a persistent, shared data context, that is local to the Reminder instance. This can be used to store arbitrary data related to the reminder's execution. This would allow middleware to maintain internal state and share data with each other, much like middleware can mutate headers and the body of the response.

This ReminderState would behave like a normal GrainState and would remove the current need to have a different schema for persisting reminders, and allow it to be extensible. This context could be implemented as a Dictionary<string, object?> on the base of the ReminderState object.

Standard Middleware

Some standard middleware that would come with the new Microsoft.Orleans.Reminders package (#7410):

Each reminder would be able to specify it's own, ordered set of reminder middleware:

Example 1: starting in 3 days, run every minute, max of 100 ticks, with additional options

this.CreateReminder(name: "MyAwesomeReminder")
  .UseWait(o=> o.NextTickAt = DateTime.UtcNow.AddDays(3)) // Wait 3 days until our first Tick
  .WithInterval(o=> o.NextTickAt = context.CurrentTick.Add(TimeSpan.FromMinutes(1)) // Run every minute
  .WithLimitTicks(o=> o.MaxTicks == 100)
  .UseSkipLateReminder(TimeSpan.FromMinutes(2)) // Do not deliver the reminder if later than 2 minutes (give a 2 minute grace-period), sets `context.ShouldSkip` = true if late
  .UseMissedReminderHandler(async (dueTime) => await MissedThis(dueTime))
 .UseReminderHistoryRecorder<MyCustomReminderHistoryGrain>(options {
    options.GrainId = $"{this.GetPrimaryKey()}-MyAwesomesReminder"; // This could be derived as well but there would be cases where we would want to manually control this
  });

Example 2: Cron Expression based reminder

await this.RegisterOrUpdateReminder("MyAwesomeReminder", builder => builder
  .UseWait(o=> o.NextTickAt= DateTime.UtcNow.AddMinutes(5))
  .UseCronExpression("0/5 2,3,8,9,17-23 ? 2-8,7,10 MON-FRI") 
  .UseSkipLateReminder(TimeSpan.FromMinutes(2))
  .UseMissedReminderHandler(async (dueTime) => await MissedThis(dueTime))
  .UseReminderHistoryRecorder<MyCustomReminderHistoryGrain>(options {
    options.GrainId = $"{this.GetPrimaryKey()}-MyAwesomesReminder"; // This could be derived as well but there would be cases where we would want to manually control this
  })
);

Bypassing Current Limits on Reminders

We are currently limited to ReminderRegistry.MaxSupportedTimeout, which is 0xfffffffe or approximately 49 days.

The way we bypass this limit is to intercept the reminder prior to it being delivered to the CallingGrainReference associated with the reminder and determine if it is time for the reminder to be triggered. If we've met dueTime, we deliver it, if it is still early, we update the reminder with the new dueTime up to a max of 0xfffffffe.

This logic would be registered as default Reminder Middleware and added to the pipeline by Reminders V2. If the reminder isn't ready to be fired, it would stop the pipeline before executing any other configured middleware.

For a user-land implementation, see https://github.com/web-scheduler/web-scheduler/blob/main/Source/WebScheduler.Grains/Scheduler/ScheduledTaskGrain.cs#L104-L196.

Persistence

Reminders V2 state data will be treated like any other Grain State and will not require any changes to grain storage providers.

Migrating Existing Timers

Two options exist for a path forward:

  1. Leave support for Reminders V1 so this doesn't become a breaking change and introduce Reminders V2 as an opt-in feature.
  2. Support Reminders V2 only, and migrate V1 to V2. This can be problematic and complex, see: Migrating v3.x to v4.x Grains.

Personally, I prefer Option 1 as this makes it removes a lot of engineering work and complexity. This puts the onus of migrating on users within their existing grain code. At most, this would take ~49 days for all reminders to be migrated without any downtime. Additionally, we'd publish a deprecation schedule for Reminders V1 or leave them until we're ready to remove them.

Reminders V2 will most likely have a separate processing pipeline all together due to the work in #7238 and #947, so it would make sense for it to be a fully separate, opt-in feature. This would also allow us to release Reminders V2 in preview mode alongside regular releases to get feedback and iterate.

suraciii commented 2 years ago

The way we bypass this limit is to intercept the reminder prior to it being delivered to the CallingGrainReference associated with the reminder and determine if it is time for the reminder to be triggered. If we've met dueTime, we deliver it, if it is still early, we update the reminder with the new dueTime up to a max of 0xfffffffe.

I think we won't need to do this if https://github.com/dotnet/orleans/issues/947 resolved, only recent (2h e.g.) reminders will be queried and scheduled with .NET timers in memory

ElanHasson commented 2 years ago

@suraciii

I think we won't need to do this if https://github.com/dotnet/orleans/issues/947 resolved

I think you may be correct here. I've been thinking about having the ability to support different ReminderService providers (read: not Reminder Storage providers as we have today) which will all have different pros and cons: This would allow users to select the algorithm and way reminders are managed within the silo, and even allow for a mix-and-match approach for different types.

I don't have anything more to post on this as I'm still iterating on the concept internally at the moment.

Perhaps this story is split into a reminders v1.5 and a v2, where v1.5 removes these limitations. It's not clear to me yet what the approach should be here.

7410 will give us lots of flexibility here.

koenbeuk commented 2 years ago

I like this proposal, though I think the API needs some work. As reminder registration happens asynchronously, the reminder registration builder needs to be awaited. In addition, we need to consider how we would Update reminders. Therefore, I would propose something in the shape of the following API:

await this.RegisterOrUpdateReminder("MyAwesomeReminder", builder => builder
  .UseWait(o=> o.NextTickAt= DateTime.UtcNow.AddMinutes(5))
  .UseCronExpression("0/5 2,3,8,9,17-23 ? 2-8,7,10 MON-FRI") 
  .UseSkipLateReminder(TimeSpan.FromMinutes(2))
  .UseMissedReminderHandler(async (dueTime) => await MissedThis(dueTime))
  .UseReminderHistoryRecorder<MyCustomReminderHistoryGrain>(options {
    options.GrainId = $"{this.GetPrimaryKey()}-MyAwesomesReminder"; // This could be derived as well but there would be cases where we would want to manually control this
  })
);

In this example, I simplified the registration of the middleware by consuming extension methods in a hypothetical IReminderBuilder. In addition, MissedReminderHandler now refers to a Lambda instead of a separate handler class and ReminderHistoryRecorder is referring to a different grain with a custom GrainId.

ElanHasson commented 2 years ago

@koenbeuk this is great feedback! I love it! Will update the proposal to incorporate it.

Thank you!

jsteinich commented 2 years ago

One pattern that we use in the current API is a durable retry (on both one shot and periodic). Adding retry logic might be expanding scope a bit much, but it's definitely a bit clunky in the current API.

Could start to venture down the path of Polly which has very extensive logic available, but something along the lines of the ClusterClient connection retry would probably suffice.

ElanHasson commented 2 years ago

@jsteinich, what would the API look like to you? I've used Polly only at the basic level, but am somewhat familiar with it.

jsteinich commented 2 years ago

A simple option might look like:

builder.UseFailedTickHandler(Func<Exception, ValueTask<bool>> retryFilter)

This would allow the user to capture errors, decide whether or not to retry, and delay the retry.

Making something more Polly-esque could probably just be built on that basic concept by providing built-in failure handlers and some extension methods to make them easier to access, i.e.:

builder.RetryFailedTicks(3) // retry failures 3 times using a default interval between retries

Edit: While nice, that API doesn't lend itself to durability very well. The reminder system would really need to know when to fire again and manage that rather than just delaying a task. The function would probably need to return when (or if) to try again. Would also need to have context available in that case. Perhaps:

builder.UseFailedTickHandler(Func<Exception, Context, TimeSpan?>)
ReubenBond commented 2 years ago

How would the builder interact with the reminder system? Are they setting properties on the state? In that case, there must be some middleware registered to handle those properties. Is that right?

I wonder how many kinds of options people would like to specify. If it's a small, closed set, then perhaps support could be hard-coded.

If reminder state is stored just like regular grain state (presumably using a regular grain state provider), how does the system know how to enumerate reminders in order to fire them as they become due?

ElanHasson commented 2 years ago

How would the builder interact with the reminder system? Are they setting properties on the state?

Yes, via the shared state.

In that case, there must be some middleware registered to handle those properties. Is that right?

Yes correct, or they'd be ignored.

I wonder how many kinds of options people would like to specify. If it's a small, closed set, then perhaps support could be hard-coded.

I think there should be a default set that potentially matches reminders v1 behaviors. Something analogous to .ConfigureWebHostDefault, but for reminders.

If reminder state is stored just like regular grain state (presumably using a regular grain state provider), how does the system know how to enumerate reminders in order to fire them as they become due?

I think we can reuse the existing storage system for reminders. I'm still analyzing and will get back to this.

ReubenBond commented 2 years ago

These are some ideas for "Reminders V2", much of which might be orthogonal to this. Most of these are internal architecture/design goals, but some are API/programming model related:

Defining reminder callbacks using method calls

NOTE: Most of the following is out-of-scope for reminders. I'll open a new issue for us to discuss. It's useful to consider how Reminders might be used for something like this, however.

Here's an example demonstrating some potential APIs.

class ShoppingCartGrain : IShoppingCartGrain, IShoppingCartWorkflows
{
  private IPersistentState<List<ItemDescription>> _state;
  public async Task AddItemToCart(ItemDescription item)
  {
    await ScheduleAsync<IShoppingCartWorkflows>( // The workflow interface is specified so that a callback can be made.
      self => self.ReturnUnclaimedItem(item),
      // Optional due time: this can be omitted 
      TimeSpan.FromMinutes(15),
      // I prefer we use named policies, but storing ad-hoc policy descriptions on the reminder instance is ok, too
      SchedulingOptions.Create(retryPolicy: x, backoffPolicy: y));

    _state.Add(ItemDescription item)
    await _state.WriteStateAsync();
  }
}

Here's an alternative: have a method which returns a WorkflowFactory for the current grain (or another grain) which can be used to schedule one or more workflows. The advantage of this approach is that it's hopefully clear that it's not a lambda that is going to be scheduled. In the lambda case above, the lambda would be invoked immediately to capture a single interface call, which might be hard to explain.

// Note that the grain implements two interfaces.
// The IShoppingCartWorkflows interface is essentially internal to the grain.
// It is used for scheduling reminder callbacks (which I'm calling workflows here)
class ShoppingCartGrain : IShoppingCartGrain, IShoppingCartWorkflows
{
  private IPersistentState<ShoppingCartState> _state;
  public async Task AddItemToCart(ItemDescription item)
  {
    // Create a workflow reference.
    // Calls to a workflow reference
    var workflowReference = GetWorkflowFactory<IShoppingCartWorkflows>(options);

    // Schedule an operation.
    // The call will return as soon as the call is scheduled, not once it's completed.
    await workflowReference
      .Defer(TimeSpan.FromMinutes(15)) // This builder approach seems consistent with the one Elan described in a previous comment
      .Invoke // This returns an IShoppingCartGrain on which c
      .ReturnUnclaimedItem(item.Id);
  }

  public async Task CompletePurchase(OrderDetails orderDetails)
  {
    // Durably schedule a call immediately to complete the purchase
    await GetWorkflowFactory<IShoppingCartWorkflows>(options)
      .Invoke
      .CompletePurchaseWorkflow(orderDetails);
  }

  public async Task CompletePurchaseWorkflow(OrderDetails orderDetails)
  {
    if (_state.Status is CartStatus.Abandoned)
    {
      // Maybe indicate an error here.
      return;
    }

    // Helper method showing how a rudimentary deduplication technique:
    // If the "orderId" value is present in the WorkflowContext, return it, otherwise create a new guid and store it in the context.
    // This can be used for any non-deterministic process (Random number generation like here, or I/O such as HTTP calls & grain calls)
    var orderId = await WorkflowContext.GetOrAddAsync("orderId", () => Guid.NewGuid());

    // A more complex example: if "order-created" isn't in the workflow context, issue the call to get the value and store it in the (durable) workflow context.
    if (!WorkflowContext.TryGetValue("order-created", out _))
    {
      await GrainFactory.GetGrain<IOrderGrain>(orderId).CreateOrder(orderDetails);
      await WorkflowContext.SetValueAsync("order-created", true);
    }

    // NOTE: The above if-block can also be expressed using the GetOrAddAsync method, like so:
    await WorkflowContext.GetOrAddAsync(
      "order-created",
      () => GrainFactory.GetGrain<IOrderGrain>(orderId).CreateOrder(orderDetails))
    // END

    // To avoid 'ReturnUnclaimedItem' from returning an item back to the inventory pool, mark this shopping cart as checked out.
    _state.Value.Status = CartStatus.CheckOutCompleted;
    await _state.WriteStateAsync();

    // No need to reschedule, etc, just complete the call.
  }

  public async Task ReturnUnclaimedItem(ItemId itemId)
  {
    var state = _state.Value;
    if (state.Status is CartStatus.CheckoutCompleted)
    {
      // Do not return the item to the pool if the cart has been checked out already.
      return;
    }

    // Prevent checkout from occurring, since the items are being returned to the inventory pool.
    state.Status = CartStatus.Abandoned;
    await _state.WriteStateAsync();

    var inventoryPoolGrain = GrainFactory.GetGrain<IInventoryPoolGrain>(itemId);
    await inventoryPoolGrain.ReturnItems(state.Items); // This would likely be more complex.
  }
}

Much of the above is just musings about what we could do, not necessarily a strong opinion on what we should do. We can pick some elements and include them and add some other elements (eg, the more fully-fledged workflows) later on.

I am very much open to suggestions and ideas

ReubenBond commented 2 years ago

I wrote a proposal to implement an extensible log-based storage provider for grain state: https://github.com/dotnet/orleans/issues/7691.

Reminders V2 was one of the use cases I had in mind. The particulars of each reminder would be stored on the grain itself and the reminder service would only know when the earliest reminder is due for a grain. For further extensibility, we can also include a property bag (eg, imagine a priority property becomes useful).

The reminders table contains one entry per grain, indicating the minimum of the due time of all reminders on the grain. The details of a grain's reminders will be stored in the grain's own state. A reminder will fire as soon as it becomes due by calling a new grain extension, ReminderGrainExtension, which will find and fire due reminders. The ReminderGrainExtension is responsible for removing unnecessary reminders from the reminder table by calling the reminder service. The extension is also responsible for calling IRemindable.ReceiveReminder on the grain, in order to support the existing reminder interface. This mechanism also allows us to enhance the functionality of reminders in the future, eg to support various retry mechanisms in a scalable manner. The reminder grain extension can be used to support external callers managing reminders for a grain without having to add additional grain methods.

image

Most of the logic for handling reminders would be moved into a grain extension. The grain extension will call into the reminder service to update the ~"next reminder due time" before persisting the reminder details locally.

This can help in many ways. For one, it allows reminders to be scheduled atomically alongside state updates. It also allows reminders to atomically update state and remove themselves. Essentially, it can be used to implement exactly-once reminders. It also alleviates the burden from the reminder service when it comes to things like retry tracking, tracking multiple reminders per grain, etc. However, it increases the burden on the reminder service for repeated reminders, since the semantics are now that it will fire any due reminders, not only reminders which become due while the reminder service is active (i.e, missed reminders are picked up and can be handled), and there are no periodic reminders as far as the reminder service is concerned: it is up to the grain extension to bump the due time in the reminder service to the next earliest due time for reminders.

This means that in some cases, the reminder service previously would handle very few write operations (eg, imagine a fixed number of high-frequency recurring reminders), but would now have to potentially perform one write for each reminder which becomes due. I think it's a worthwhile trade-off, but I could be convinced otherwise, if there are better alternatives.