elsa-workflows / elsa-core

A .NET workflows library
https://v3.elsaworkflows.io/
MIT License
6.4k stars 1.19k forks source link

Bug Report: Issue with Workflow Level Variables Loading in Elsa 3 Framework #4762

Open gabrielschmith opened 10 months ago

gabrielschmith commented 10 months ago

Dear Elsa 3 Framework Support Team,

I am writing to report an issue we have encountered while working with the Elsa 3 Framework. We are experiencing a problem related to the loading of workflow level variables.

Description of the Issue:

During the implementation of workflows, we observed that the variables designated at the workflow level are not being properly loaded or recognized. This issue seems to occur sporadically, without a clear pattern or specific triggering condition.

Impact:

This problem is impacting our ability to reliably execute workflows, as the expected behavior of these variables is essential for the correct functioning of our processes. It is causing delays and inconsistencies in our workflow execution, which is a critical part of our operations.

Steps to Reproduce:

Define workflow level variables in the Elsa Dashboard. Trigger the workflow. Observe that the variables are not loaded as expected during the workflow execution.

My Scene:

image

I defined a variable called SessionPayload and this variable is initialized with the command below:

var sessionPayload = SessionPayload.Get(context);
context.SetVariable("SessionPayload", sessionPayload);

I have an activity that is executed in the middle of the flow called ChatBotTrigger that is executed with the code below, in this activity the line

var model = context.GetVariable<ChatBotSessionModel>("SessionPayload");

the value is returned successfully.

public class ChatBotTrigger : CodeActivity<ChatBotSessionModel>
{
    protected override async ValueTask ExecuteAsync(ActivityExecutionContext context)
    {
        var model = context.GetVariable<ChatBotSessionModel>("SessionPayload");

        if (string.Equals(model?.Type, "IncomingMessage", StringComparison.OrdinalIgnoreCase))
        {
            await context.CompleteActivityWithOutcomesAsync("IncomingMessage");
        }
        else if (string.Equals(model?.Type, "RestApi", StringComparison.OrdinalIgnoreCase))
        {
            await context.CompleteActivityWithOutcomesAsync("RestApi");
        }
        else
        {
            await context.CompleteActivityWithOutcomesAsync("Failed");
        }

        context.SetResult(model);
    }
}

As shown in the workflow, I have a timer that after 10 seconds calls an activity called ChatBotUnsetSession that contains the following implementation (The same get variable code):

public class ChatBotUnsetSession : CodeActivity
{
    protected override async ValueTask ExecuteAsync(ActivityExecutionContext context)
    {
        var sessionPayload = context.GetVariable<ChatBotSessionModel>("SessionPayload");
        var cache = context.GetRequiredService<IDistributedCache<ChatBotSessionModel>>();

        if (!string.IsNullOrEmpty(sessionPayload?.SessionId))
        {
            await cache.RemoveAsync(sessionPayload.SessionId);
        }
    }
}

when the line

var sessionPayload = context.GetVariable<ChatBotSessionModel>("SessionPayload"); 

and executed we have the following return in class ExpressionExecutionContextExtensions:

image

foreach (var block in context.Memory.Blocks.Where(b => b.Value.Metadata is VariableBlockMetadata))
{
    var metadata = block.Value.Metadata as VariableBlockMetadata;
    if (metadata!.Variable.Name == name)
        return metadata.Variable;  <-- Here the value is empty and the type is wrong
}

return localScopeOnly ? null : context.ParentContext?.GetVariable(name);

how can we see the returned value is empty and the type is also wrong

when we return to the origin function

  public static T? GetVariable<T>(this ExpressionExecutionContext context, string name) => (T?)context.GetVariable(name)?.Value;

we have an empty value and the wrong type.

In my head there was a problem implementing the workflow type storage, looking at the code I didn't find anything problematic, but I still wrote a custom storage to save in Redis with the impl below, but the problem remains the same.

public class RedisStorageProvider(IDistributedCache<object, string> distributedCache) : IStorageDriver
{
    public async ValueTask WriteAsync(string id, object value, StorageDriverContext context)
    {
         await distributedCache.SetAsync(id, value, token: context.CancellationToken);
    }

    public async ValueTask<object?> ReadAsync(string id, StorageDriverContext context)
    {
        return await distributedCache.GetAsync(id, token: context.CancellationToken);
    }

    public async ValueTask DeleteAsync(string id, StorageDriverContext context)
    {
        await distributedCache.RemoveAsync(id, token: context.CancellationToken);
    }
}

Environment:

Elsa 3 Framework version: 3.0.0/3.0.1 Packages:

<PackageReference Include="Elsa" Version="3.0.0"/>
<PackageReference Include="Elsa.CSharp" Version="3.0.0"/>
<PackageReference Include="Elsa.Identity" Version="3.0.0"/>
<PackageReference Include="Elsa.Scheduling" Version="3.0.0"/>
<PackageReference Include="Elsa.Workflows.Api" Version="3.0.0"/>
<PackageReference Include="Elsa.EntityFrameworkCore.PostgreSql" Version="3.0.0" />
<PackageReference Include="Elsa.JavaScript" Version="3.0.1" />
<PackageReference Include="Elsa.Liquid" Version="3.0.1" />
<PackageReference Include="Elsa.Python" Version="3.0.1" />

Host environment: Windows, Rider

We kindly request your assistance in resolving this issue. If there are any workarounds or patches available, please let us know. Additionally, if further information is required from our end to diagnose the problem, feel free to reach out.

Thank you for your attention to this matter. We appreciate your prompt response and look forward to a resolution.

Best regards,

sfmskywalker commented 10 months ago

@gabrielschmith Thanks for the detailed bug report. In order for us to reproduce this behaviour, it would be tremendously helpful if we could also get a (stripped down) version of the application, or at least an export of the workflow definition so that we are looking at the exact same thing.

gabrielschmith commented 10 months ago

@sfmskywalker I can't export the entire project for testing, but I can export the custom activities created and the exported workflow definition (See attachments), maybe some codes don't compile like

var cache = context.GetRequiredService<IDistributedCache<ChatBotSessionModel>>();

you can remove them and leave only the context.GetVariable and the problem will happen.

If you need the startup code it is below:

 private static void ConfigureElsa(ServiceConfigurationContext context, IConfiguration configuration)
    {
        var connectionString = configuration.GetConnectionString("Elsa")!;

        context.Services.AddElsa(elsa =>
        {
            // Configure Management layer to use EF Core.
            elsa.UseWorkflowManagement(management => management.UseEntityFrameworkCore(x => x.UsePostgreSql(connectionString)));

            // Configure Runtime layer to use EF Core.
            elsa.UseWorkflowRuntime(runtime => runtime.UseEntityFrameworkCore(x => x.UsePostgreSql(connectionString)));

            // Default Identity features for authentication/authorization.
            elsa.UseIdentity(identity =>
            {
                identity.TokenOptions = options => options.SigningKey = "sufficiently-large-secret-signing-key"; // This key needs to be at least 256 bits long.
                identity.UseAdminUserProvider();
            });

            // Configure ASP.NET authentication/authorization.
            elsa.UseDefaultAuthentication(auth => auth.UseAdminApiKey());

            // Expose Elsa API endpoints.
            elsa.UseWorkflowsApi();

            // Setup a SignalR hub for real-time updates from the server.
            elsa.UseRealTimeWorkflows();

            // Enable workflow expressions
            elsa.UseCSharp();
            elsa.UseJavaScript();
            elsa.UseLiquid();

            // Enable HTTP activities.
            elsa.UseHttp();

            // Use timer activities.
            elsa.UseScheduling();

            // Register custom activities from the application, if any.
            elsa.AddActivitiesFrom<Program>();

            // Register custom workflows from the application, if any.
            elsa.AddWorkflowsFrom<Program>();
        });

        context.Services.AddStorageDriver<RedisStorageProvider>();
    }

Thanks for the feedback.

ChatBot.zip

sfmskywalker commented 10 months ago

Hi @gabrielschmith , thank you for the attachment, that helps. One piece I can't seem to find, however, is the IDistributedCache<object, string> interface. From what I have been able to find so far, is the IDistributedCache interface from the Microsoft.Extensions.Caching.Distributed namespace in the Microsoft.Extensions.Caching.Abstractions package, but not a version taking two generic type arguments.

sfmskywalker commented 10 months ago

maybe some codes don't compile like

var cache = context.GetRequiredService<IDistributedCache>(); you can remove them and leave only the context.GetVariable and the problem will happen.

If it's not too much to ask, can you create a stripped down project that compiles? Right now, I need to not only remove all occurrences of IDistributedCache, but also references to WhatsAppTwilioSender, SmsMessage, etc. In addition, removing the cache code also affects logic in certain activities, such as ChatBotSetSession which sets a value into the cache; which makes me wonder how this change may impact the ability to reproduce the issue. Perhaps it doesn't, but I'd rather not guess :) My concern is that I might inadvertently introduce other changes that might cause different behaviours.

gabrielschmith commented 10 months ago

@sfmskywalker I made a simple application with minimal apis in dotnet 8 containing Elsa Studio (Blazor), Elsa Server(.net8), the custom activities, removed the references and made it as simple as possible. I made some changes to Elsa, changing the Postgres database to SQLite. I'm uploading with SQLite created. The workflow is already created, I did a local test and the same error occurs. Inside the project there is an http file called ChatBot.http if you want to use it to call the workflow endpoint

WebApplication1.zip

gabrielschmith commented 9 months ago

Just a update @sfmskywalker

I did several tests and the problem persists, so I pulled Elsa and did some tests and the problem really is in the cast on ExpressionExecutionContextExtensions.cs

block.Value.Metadata as VariableBlockMetadata;

So I made a change to initialize the value with the value in block

public static Variable? GetVariable(this ExpressionExecutionContext context, string name, bool localScopeOnly = false)
    {
        foreach (var block in context.Memory.Blocks.Where(b => b.Value.Metadata is VariableBlockMetadata))
        {
            var metadata = block.Value.Metadata as VariableBlockMetadata;
            if (metadata!.Variable.Name == name)
            {
                metadata.Variable.Value = block.Value.Value; <-- Here
                return metadata.Variable;
            }
        }

        return localScopeOnly ? null : context.ParentContext?.GetVariable(name);
    }

This resolves the issue when trying to get variables which was the bug that was occurring.

Do you have any suggestion?

NikiforovAll commented 6 months ago

@sfmskywalker also happens to me, the variables are not restored correctly after the interruption (essentially deserialization)