Open jviau opened 6 months ago
The one caveat to this policy is that we've seen cases where changes to Newtonsoft.Json settings can cause unintended deserialization failures. This can happen as part of a rollout of a new version of an app, whether due to changes made by the user (though hopefully we've rooted all those possibilities out) or changes made by the DTFx maintainers. Either way, giving time for users to roll back the change, e.g. 24 hours, before permanently deleting their data, might be prudent.
Yeah will need some design. It could be an opt-in setting? Or a callback? User gets the exception and gets to return true/false for purge?
Either way, the framework needs to take action here as it is not something users can self-mitigate (they will be fighting with the workers to dequeue and delete the message!)
We should evaluate updating these two code locations to delete corrupted (fails deserialization) messages from the queue. It is not expected that deserialization failures is a transient issue and no amount of retries / time delay will fix these messages. Particularly because it is only framework types (and not user) being deserialized here.
Location 1: https://github.com/Azure/durabletask/blob/b4ec695dc5c51319b99c20557f9d47a1dd518729/src/DurableTask.AzureStorage/Messaging/ControlQueue.cs#L108-L110
Location 2:
https://github.com/Azure/durabletask/blob/b4ec695dc5c51319b99c20557f9d47a1dd518729/src/DurableTask.AzureStorage/Messaging/WorkItemQueue.cs#L47-L49