Open GuusRaaphorst opened 1 year ago
Thanks @GuusRaaphorst. There's been some known memory issues around the message sorter behavior of Entities, so this may be a known issue. See here for some context.
My suggestion would be to reduce the message sorter re-ordering window to reduce how big the message sorting array can get, but for some reason I'm not seeing that as a configurable value in our host.json settings. So I need to look into that, let me get back to you.
Thanks for your quick response @davidmrdavid !
I assume you are referring to EntityMessageReorderWindowInMinutes
from here?
I'll give it a try!
I also see that Netherite should solve this problem, so we might need to take a look in that too.
I have been playing with settings, cleaning all history of durable functions, etc. Nothing.
In the end it appeared to be one little stupid change causing havoc in all our functions apps that uses durable functions. The services that we use all share some common stuff. One part of this is that on startup, some general services (e.g. logging) are registered.
Some lines of code were added there, to improve on our automatic openapi documentation generation. This code was doing this: JsonConvert.DefaultSettings = () => jsonSettings;
and adding some specific JsonConverters, related to dates, enums, etc.
Removing this solved the problems. I have no idea why this caused the problems that we saw and I also have no idea how I could have seen this in logging or anything.
But, it all works again. Just wanted to let you know..
Thanks for the report, @GuusRaaphorst.
Regarding this line of code: JsonConvert.DefaultSettings = () => jsonSettings;
Oh, I've seen that error before. In older versions of the Durable Extension, it was possible for user code to accidentally override the settings that the DF extension itself uses for serialization doing exactly that, which in turn, breaks all sorts of low level details. I thought I had fixed that.
@GuusRaaphorst: do you have a minimal repro showing this error with the latest DF release? That would help us greatly.
Description
We have an Azure function app that at some point started to log out of memory exceptions when trying to start an orchestration.
The function app basically does the following:
var entityId = EntityId(nameof(PosEntity), posName)
Note that we use a durable entity because of some peculiarities of the external system. It only allows 1 request at a time. With the entity, we make sure that it is called only once.
Expected behavior
I would not expect out of memory exceptions from the durabletask framework.
Actual behavior
At some point, but not at the same time, some of our environments (test and acc) started to show performance degradation and high memory usage causing alerts. The logging that we have in place showed the following log lines (the last 2 a lot more than the first 2):
Relevant source code snippets
I do not have code that I am allowed to share at the moment.
Known workarounds
I have tried a lot, that did not seem to help
Adding the following settings (before, we used the defaults) and playing with the values a little does seem to help. Currently our test environment uses the following values and seems to be runnning ok for a couple of hours now.
App Details
Screenshots
At some point I was able to create profiler trace and a memory dump of the function app (see images). The results of those are pointing to the MessageSorter and the DurableTask.RequestMessage. That is why I am opening this ticket. To inform, but also hoping to get some guidance on what is going on here and if I am doing something wrong.
If deployed to Azure
Orchestration execution was aborted: Session aborted because of OutOfMemoryException, traceFlags=1Y!D