Open paulbatum opened 6 years ago
Any progress with this issue?
It's not possible currently to gracefully shutdown the running function instances when something outstanding happened - timeout exceeded, host stopped/restarted. Even handling cancellation token doesn't help as mentioned above.
waiting for 5 second can be too aggressive. Is there a reason not waiting 5/10 minutes (max function duration) after cancellation token signaled, before hard stop of the instance? So all active functions will finish the work gracefully even if their cancellation token handling is not perfect.
Cross-refencing https://github.com/Azure/Azure-Functions/issues/866 since it may be related as well
@fabiocav
When a timeout occurs in a Function App it appears from my testing (C#, V2) that any logging to Application Insights also goes away, ie it is not possible to trace what happened in the function before the timeout. The timeout exception itself also doesn't seem to be logged to Application Insights and thus cannot be monitored.
This is, tbh, hella dumb and renders the CancellationTokens rather useless. If the CancellationToken is signalled, it's already too late to save your process.
Any chance? @jeffhollan @eduardolaureano
Any chance you could be nice to the .Net guys by doing this before #2152 :)
I have a case very related to this topic, and I can't find appropriate answer for a long time. I have the function with the EventHubTrigger. Messages from the event hub are pulled in batches. Now, according the documentation those messages are checked out when function ends. This means that if the process stops for any reason(stop host, restart...), that prevents the function to checkout received messages and next time when function starts, it will pick the same unchecked already received messages. According to all mentioned here, I have no option to gracefully stop the function, which means somehow to tell it to stop, after function ends and checkout occurs, before starting again and pulling new set of messages, and I must handle possible duplicates in my code. Is this true, or there is a solution for controlled shutdown?
When processing any event hubs workload, you need to write your code to allow for duplicates, because Event Hubs does not provide "at most once" guarantees. Even if you were able to handle the shutdown case correctly, there are other cases where your code might need to handle duplicates, for example, if a partition lease is lost.
Hi, not sure if this is the right place to ask but - is there an function app level host shutdown event I can intercept so I can clean app static resources used but the whole app? For example I need to call Serilog.Log.CloseAndFlush(). I can't find anything in the doc, only Startup event where I register the logger. Thanks.
@gkindov I don't think so. I suggest asking the folks in the Azure Functions Discord. https://discord.gg/YEQPcCsY
I have an issue related to @kreaton's. We use an external performance and error monitoring tool that needs to close gracefully in order to log to a remote server. When the function is killed with prejudice, it never gets a chance to log the performance information that it has recorded so we don't have any traces to use to track down what is causing the runs to take so long.
Any chance that this issue was resolved by anyone? Is it possible to catch and handle the execution timeout gracefully instead of killing the process or restart? In my case an unlimited timeout or retries isn't a effective option so any answer would be appreciated.
There should definitely be an event which fires before the timeout so processes can be shutdown gracefully.
Does anybody know something about that mysterious event that fires before the timeout? We really need this in our functions!
Has progress been made on this issue? How can timeouts be handled appropriately? Is there an c# event or delegate to use when a timeout occurs? Thank you very much for your help.
So the consumption plan for functions has a default execution timeout of 5 minutes. Its not great to allow your functions to hit this timeout because when it happens, your entire process ends up getting restarted (because its the only way to force the execution to stop -
task.Abort()
does not exist). This will be disruptive to other long running functions in the same process.The challenge is that as a function author that is aware of this, there's not much you can do to address it. In the case of C#, you could update your function to take a
CancellationToken
and check the state of that token (or pass it into async APIs your function is calling). However even if you do this, today the system will still terminate the process (because it does not check to see if your function actually honored the cancellation request).So, this work item tracks the idea of making the timeout mechanism smarter. It would do the following:
In order for this approach to work for multiple languages, we need a way to support the equivalent of cancellation tokens for out of proc languages which is tracked by https://github.com/Azure/azure-webjobs-sdk-script/issues/2152.