elsa-workflows / elsa-core

A .NET workflows library
https://v3.elsaworkflows.io/
MIT License
6.43k stars 1.19k forks source link

[BUG] Default CRON causes extremely high memory usage until Out of Memory exception #5312

Open rboylesDev opened 6 months ago

rboylesDev commented 6 months ago

Description

We are using Elsa 3 with the default scheduling. Whenever we have a Timer or CRON triggered workflow, we have noticed jumps in memory usage when the workflow is triggered and then this memory is never garbage collected. We are running Elsa on Azure Container Apps with 1 GB of RAM. With a CRON that runs every 15 minutes, it runs for about two hours until the container app crashes with an out of memory exception and then is automatically restarted. We have experienced this memory leak as well locally, but with significantly more RAM, it is not causing issues unless it runs for days.

Steps to Reproduce

Create even a simple workflow with a Timer or CRON expression, even one that just writes to console. Allow it to run and trigger and observe memory allocations.

Expected Behavior

While a memory jump isn't unexpected, having GC run and release memory is expected.

Actual Behavior

Memory appears to not be de-allocated.

Screenshots

Here is a sample of memory usage from our container app showing the raising memory until the app crashes and restarts. image

Environment

Troubleshooting Attempts

Initially we thought it might be the amount of work that the workflow is trying to do. But even a simple workflow of writing a console message or dispatching another workflow will still continue to allocate memory.

glime-ai[bot] commented 6 months ago

Based on the context provided, it seems that you are experiencing a memory leak issue with Elsa 3 when using Timer or CRON triggered workflows. This is a known issue that has been reported by other users as well, as seen in the GitHub issue #4884. The issue describes a memory leak related to CodeActivity which may be similar to what you are experiencing with Timer or CRON activities.

To address this issue, you can try the following steps:

  1. Update Elsa to the latest version: Ensure that you are using the latest version of Elsa, as the issue might have been addressed in a newer release.

  2. Monitor Garbage Collection (GC): Use tools like dotMemory to monitor memory allocations and GC. This can help you identify if the memory is indeed not being collected.

  3. Implement IDisposable: If you have custom activities, ensure that they implement IDisposable and properly dispose of any unmanaged resources.

  4. Check for Subscriptions: Ensure that any event handlers or message subscriptions are being properly unsubscribed or disposed of to prevent memory leaks.

  5. Review Configuration: Review your Elsa configuration to ensure that you are not unintentionally retaining workflow instances in memory. For example, check if you are using a persistent workflow store and if the configuration of the store is correct.

  6. Report the Issue: If none of the above steps help, and you are using the latest version of Elsa, consider reporting the issue on the Elsa GitHub repository. Provide detailed information about your environment, the version of Elsa you are using, and the steps to reproduce the issue.

  7. Seek Community Help: You can also seek help from the Elsa community on platforms like Stack Overflow or the Elsa Slack channel. Other users or the developers might have encountered similar issues and could have solutions.

  8. Workaround: As a temporary workaround, you might consider implementing a mechanism to periodically restart your service to clear the memory until a permanent fix is available.

Please note that without specific details about your environment and the version of Elsa you are using, it's challenging to provide a precise solution. If you can provide more details, I may be able to offer more targeted advice.

This comment was generated by Glime.

rboylesDev commented 6 months ago

Oh, I forgot to mention, we are using version 3.1.2 of Elsa. So I do not believe it is the same issue as the Code Activity memory leak.

sfmskywalker commented 6 months ago

Confirmed:

image

@rboylesDev thanks for reporting!

sfmskywalker commented 6 months ago

Actually, when I let it running, it seems to purge memory:

image

So now I am not really sure there is a memory leak 🤔

sfmskywalker commented 6 months ago

Unless the "23,4 unreachable" is a hint.

sfmskywalker commented 6 months ago

@rboylesDev Just to make sure: is the application using non-memory stores for all of the modules:

Unless you explicitly configure the persistence provider for these two, they will use the Memory provider by default - which will most certainly lead to an increase of memory consumption as new workflow instances and execution records are stored in an in-memory dictionary.

rboylesDev commented 6 months ago

We are using Elsa with EF Core and SQL Server. I believe these are configured correctly as it is a very simple code setup.

image

rboylesDev commented 5 months ago

Minor update on our end. We decided to look at Quartz scheduler instead of the built-in scheduler. This had the same result of allocating ~200MB per scheduled workflow run and never seeming to release it. What is interesting is taking the same workflow and manually running it does not see the same jump in allocation.

diegodalben commented 2 months ago

Hi! I'm facing the same issue, but in my case, I'm using Hangfire for scheduling management. I'm using version 3.1.2.

sfmskywalker commented 2 months ago

Any way you can try 3.2.0? If you can still see memory increase until it runs out, it would be good to see some screenshots and ideally a (simplified) copy of your project or a reproduction.

TimNguyenVN commented 2 weeks ago

Hi @sfmskywalker, I might be mistaken, but have you checked the RemoveScheduledTask function? From the logic, it looks like when a trigger is removed, the ScheduledTask is canceled, but it doesn't seem to be removed from the _scheduledTasks dictionary. Instead, it only gets removed from the _scheduledTaskKeys dictionary. This could lead to the _scheduledTasks dictionary growing over time as tasks accumulate without being cleared.

By the way, could you explain the purpose of _scheduledTaskKeys and keys here?

image

sfmskywalker commented 1 week ago

@TimNguyenVN You're 100% correct - the _scheduledTasks dictionary doesn't get updated in this method, which may very well explain the increase in memory consumption. I will push a fix shortly.

Great catch! 👍🏻