Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
715 stars 270 forks source link

[Feature Request] Enable an optional parameter to DurableClient.CleanEntityStorageAsync() to limit how many entries to delete #1992

Closed michc-msft closed 2 years ago

michc-msft commented 2 years ago

Problem We're seeing lots of lingering entities that have had their state deleted, but the entities themselves still exist in a Running state. So, a call to ListInstancesAsync with filters of Running/Pending still returns these entities, which blocks our deployment pipeline since we're using the Status Check With Slot option. This issue could even be considered a bug, but we've found that a way to fix it would be to call CleanEntityStorageAsync, ideally as an activity function at the end of our orchestration. It's probably good practice for us to clean up our lingering entities anyway. But, on the first couple of runs of the activity function calling the CleanEntityStorageAsync method, we're almost guaranteed to hit the activity function timeout since we've got a huge backlog of entities that need cleaned up. Even after that backlog has been cleared, we're not certain that we will never hit the timeout again.

Desired Solution I see there's a feature request for automated orchestration/entity cleanup (request #892) that is strongly desired, but unless the feature is close to completion, it would be helpful in the meantime if we could pass in a parameter to the CleanEntityStorageAsync command that would allow us to control the number of null state entities that get deleted that way we are less likely to hit the timeout.

Alternatives we've considered

sebastianburckhardt commented 2 years ago

Adding a max argument is a good temporary workaround. This should be quite easy I think.

Though in some sense this should not be something the application programmer need to concern themselves with. A better solution would be to (1) never return deleted entities in queries, and (2) have the runtime call CleanEntityStorage periodically, automatically.

michc-msft commented 2 years ago

I agree. Ideally we'd never have to see the entities again after we call DeleteState. I was curious if there is a mechanism or maybe one in the works about cleaning up stray entities? Like scenarios where the entity was spun up, but something happened and the entity state wasn't deleted (so it can't get cleaned up). I imagine we'll have some instances of this happen, and I'm not sure what the best way to go about cleaning them up would be? In the way we use entities, we use the entity to signal a new orchestration to start after a certain condition has been met, so checking if the entity is associated with a running orchestration would maybe in some edge cases not be a feasible option

sebastianburckhardt commented 2 years ago

Often, entities are meant to be long lived - they represent important application data (e.g. account balance) and thus we would not ever remove them until after the user explicitly deletes them.

Orchestrations, on the other hand, tend to have a limited lifespan; and once they have completed there is often no reason to keep them around (with some exceptions, e.g. to do deduplication of triggering events).

AnatoliB commented 2 years ago

@sebastianburckhardt Do we need any follow up on this, or we close it as by design?

sebastianburckhardt commented 2 years ago

I would say we close this as by design. Rationale: entities are modeled to behave like "virtual actors". You cannot delete virtual actors, you can only delete their state, and state deletions are not final, but can be reversed.