Closed jsnet-ops closed 2 years ago
Possible duplicate of https://github.com/Azure/azure-functions-durable-extension/issues/2129.
@sojwil the slowing down is expected because the purge operation can take a large toll on the I/O limits of the Azure Storage account. You won't see any impact on CPU or memory, just Azure Storage I/O, which affects the processing speed of other parts of your app.
We expect the purging to be done per item. So if a timeout occurs, it should have removed something.
I agree with this. Looking at the code for purging, it appears that the logic is implemented in such a way that it tries to query all instances in the provided range before deleting them individually. This is a problem since large scans could consume a lot of resources on the storage account (and potentially local memory) and any timeouts will fail just like you said. This is something we can look into.
As a workaround, check to see if you're able to change your purge logic to run more frequently and with smaller time windows. Purging smaller amounts of data at a time is more likely to be successful and have less impact on your app.
Thank you for this quick response. Looking forward to the added feature.
I've made a fix in the DurableTask.AzureStorage dependency which should significantly improve the behavior. It will be included by default in the next extension release.
Excellent news. Thank you :)
I've made a fix in the DurableTask.AzureStorage dependency which should significantly improve the behavior. It will be included by default in the next extension release.
Hey @cgillum , is there something we should do to benefit from your fix/change? Because I have the exact same issues and our application workflows are becoming terribly slow, and every Purge call is timing out, so it feels like we're in a state that only keeps getting worse.
Description
We execute PurgeInstanceHistoryAsync daily to remove all completed, terminated, cancelled or failed items from a single day (24 hours). The amount of items to be removed is 100.000 per day average. After around 15 minutes of purging the system slows down and other functions show reponse times of more than 30 seconds. Looking at the mem and cpu usage, there should not be an issue. Since this amount of items is pretty high, the purging fails after 30 minutes due to timeout but nothing is removed.
Expected behavior
We expect the purging to be done per item. So if a timeout occurs, it should have removed something. Also, purging should not claim all the functionapp resources.
Actual behavior
Too much resources are claimed for other functions to run properly. No items are removed when a timeout occurs.
Relevant source code snippets
Known workarounds
Use API to purge per item
App Details
If deployed to Azure