elsa-workflows / elsa-core

A .NET workflows library
https://v3.elsaworkflows.io/
MIT License
5.88k stars 1.06k forks source link

Memory Leak or miss-configuration / multipart form request / no persistence #5226

Closed bernhardwimmer closed 4 weeks ago

bernhardwimmer commented 1 month ago

I have troubles to set up my ELSA Workflow Server so that there is no data persistent in memory when I execute HttpRequests. What I can see (by the use of dotMemory) looks like a mem leak, but maybe this is somthing that could be solved via configuration.

Using ELSA 3.1.2 in a new .net6 console app, I have set up a test-workflow which receives a file from a multipart-form request. The endpoint is configured as follows:

new HttpEndpoint
{
    Path = new Input<string>("/document-capture"),
    SupportedMethods = new Input<ICollection<string>>(new[] { HttpMethods.Post }),
    CanStartWorkflow = true,
    Files = new Output<IFormFile[]>(files),
}

To avoid any data-persistence I have registered a NoopWorkflowStore as IWorkflowInstanceStore. Please tell me if this could be done easier.

public class NoopWorkflowStore : IWorkflowInstanceStore
{
  ...
  public ValueTask SaveAsync(WorkflowInstance instance, CancellationToken cancellationToken = default)
  {
      return ValueTask.CompletedTask;
  }
  ...
}

I have started my workflow server project with dotMemory attached. Then I sent 3 requests to my test-endpoint, forced a GC and created a 1st mem-snapshot, then I repeated the procedure to create a 2nd snapshot.

This are the 2 snapshots:

ELSA_Mem_Snapshots

A comparison of the 2 snapshots shows that there are many new object-arrays in memory.

ELSA_Mem_NewObjectArrays

Now I picked 1 of the new object-array instances a looked at the key retention path, which shows the reference which prevents garbage collection. The info about the "TlsOverPerCoreLockedStacksArrayPool+ThreadLocalArray" does not really help me to solve the issue:

ELSA_Mem_NewObjectArray_KeyRetentionPath

The last thing I looked at was the creation stack trace, which is really a deep one:

ELSA_Mem_ObjectArray_CreationStackTrace

It would be cool if someone can tell me if this is a real mem leak or just some miss-configuration topic, and if so, how to solve this.

glime-ai[bot] commented 1 month ago

Based on the information provided, it seems you are experiencing a memory leak issue when handling HTTP requests with file uploads in your Elsa Workflow Server. The NoopWorkflowStore implementation you've provided should indeed prevent workflow instances from being persisted to any store, but it does not necessarily address memory leaks that could be occurring elsewhere in the application.

The screenshots you've shared from dotMemory suggest that there are object arrays that are not being garbage collected. The TlsOverPerCoreLockedStacksArrayPool is a thread-local array pool used by .NET to reduce the overhead of frequently allocating and deallocating arrays. It's a part of the .NET's ArrayPool<T>, which is used to pool arrays internally to reduce GC pressure.

The creation stack trace indicates that the retained arrays are related to the handling of HTTP requests, which could be due to the buffering of request streams, including file uploads. This is not uncommon when dealing with large files or streams.

To address this issue, you might want to consider the following steps:

  1. Ensure that all streams and IFormFile instances are being properly disposed of after use. This is crucial as not disposing of streams can lead to memory leaks.

  2. Review the configuration of the HTTP request handling in your application to see if there's a way to optimize the buffering and memory usage. For example, you could look into streaming file uploads directly to disk or a blob store instead of buffering them in memory.

  3. Check if there are any known issues or updates related to memory leaks in the version of Elsa Workflow you are using. It might be helpful to update to the latest version if there are fixes available.

  4. If the issue persists, consider reaching out to the Elsa Workflow community or support channels with detailed information about your setup and the memory leak you're experiencing. They might be able to provide more specific guidance or identify if there's a bug in the library.

  5. As a last resort, if you suspect a bug in Elsa Workflow, you could file an issue on the Elsa GitHub repository with all the details and findings from your memory profiling.

Remember to always dispose of any disposable objects, and consider using using statements or async disposal patterns where applicable to ensure resources are released as soon as they are no longer needed.

This comment was generated by Glime.

sfmskywalker commented 4 weeks ago

Out of the box, Elsa will register memory store implementations for the following stores:

The last 3 stores contain transitory information that will be removed (bookmarks and workflow inbox messages), or at least not grow (triggers).

You already configured the IWorkflowInstanceStore with a NOOP implementation. Next, you would also need one for IIWorkflowExecutionLogStore and IActivityExecutionStore.

If you still see an increase of memory consumption that cannot be purged, then please do let me know.

bernhardwimmer commented 4 weeks ago

This looks really much better now. I have added the additional empty-Implementations as follows:

services.AddSingleton<IWorkflowInstanceStore, NoopWorkflowStore>();
services.AddSingleton<IWorkflowExecutionLogStore, NoopWorkflowExecutionLogStore>();
services.AddSingleton<IActivityExecutionStore, NoopActivityExecutionStore>();
services.AddSingleton<IWorkflowInboxMessageStore, NoopWorkflowInboxMessageStore>();
services.AddSingleton<IBookmarkStore, NoopBookmarkStore>();
services.AddSingleton<ITriggerStore, MemoryTriggerStore>();

For the ITriggerStore I registered the MemoryTriggerStore, because my Noop-implementation created problems.

The workflow that I use for testing is practically empty. It takes the multipart-form and returns the size of the byte-array of the first file:

public class MinimumWorkflow : WorkflowBase
{
    private readonly IServiceProvider _serviceProvider;

    public MinimumWorkflow(IServiceProvider serviceProvider)
    {
        _serviceProvider = serviceProvider ?? throw new System.ArgumentNullException(nameof(serviceProvider));
    }

    protected override void Build(IWorkflowBuilder builder)
    {
        // stores the files uploaded to the HTTP endpoint
        var files = builder.WithVariable<IFormFile[]>("Document", null);

        builder.Root = new Sequence
        {
            Variables = { files },
            Activities =
            {
                new HttpEndpoint
                {
                    Path = new Input<string>("/document-capture"),
                    SupportedMethods = new Input<ICollection<string>>(new[] { HttpMethods.Post }),
                    CanStartWorkflow = true,
                    Files = new Output<IFormFile[]>(files),
                },
                new WriteHttpResponse
                {
                    Content = new Input<object?>(context =>
                    {
                        //var variable = context.GetVariable(Workflow.Activities.Bludelta.Constants.BcOcrResult);
                        var variable1 = $"Received file with {files.Get(context)[0].Length} bytes";

                        return JsonConvert.SerializeObject(variable1);
                    }),
                    StatusCode = new Input<HttpStatusCode>(HttpStatusCode.OK),
                    ContentType = new Input<string>("application/json"),

                }
            }
        };
    }
}

I have created 10 snapshots. Between each snapshot I send 10 requests and before each snapshot I force a GC.

image

Although the workload is exactly the same between each snapshot, the object count and .Net mem consumption a slightly differing, but I see this behaviour across all our .Net apps.

Comparing snapshots 9 and 10, I can see a view classes with object deltas. But the delta instance count is less than 10 and I sent 10 requests between the snapshots. So I assume that this is not a mem leak.

image

Thanks for your help and the fast response!!

bernhardwimmer commented 4 weeks ago

Sorry, one more question: Which Interfaces would I need to override in Elsa 2.13 to remove all persistence? For example there does not seem to be an IActivityExecutionStore in Elsa v2.

jordanpedro commented 3 weeks ago

@bernhardwimmer Hi, can you tell where did you put those statements:

services.AddSingleton<IWorkflowInstanceStore, NoopWorkflowStore>(); services.AddSingleton<IWorkflowExecutionLogStore, NoopWorkflowExecutionLogStore>(); services.AddSingleton<IActivityExecutionStore, NoopActivityExecutionStore>(); services.AddSingleton<IWorkflowInboxMessageStore, NoopWorkflowInboxMessageStore>(); services.AddSingleton<IBookmarkStore, NoopBookmarkStore>(); services.AddSingleton<ITriggerStore, MemoryTriggerStore>();

before or after the calling of builder.Services.AddElsa () ?

bernhardwimmer commented 3 weeks ago

@jordanpedro: I have added them afterwards, so it looks like this now:

services
    .AddElsa(elsa =>
        {
            elsa.AddWorkflow<MinimumWorkflow>();
            elsa.UseHttp();
            elsa.UseWorkflowsApi();
        }
    );

// Elsa overrides --> remove persistence
services.AddSingleton<IWorkflowInstanceStore, NoopWorkflowStore>();
services.AddSingleton<IWorkflowExecutionLogStore, NoopWorkflowExecutionLogStore>();
services.AddSingleton<IActivityExecutionStore, NoopActivityExecutionStore>();
services.AddSingleton<IWorkflowInboxMessageStore, NoopWorkflowInboxMessageStore>();
services.AddSingleton<IBookmarkStore, NoopBookmarkStore>();
services.AddSingleton<ITriggerStore, MemoryTriggerStore>();
jordanpedro commented 3 weeks ago

@bernhardwimmer great, Thank you.