elsa-workflows / elsa-core

A .NET workflows library
https://v3.elsaworkflows.io/
MIT License
6.26k stars 1.15k forks source link

Elsa 2 - Startup - HTTP Error 500.30 - ASP.NET Core app failed to start (after renaming an activity in a workflow and the app restart) #974

Closed matt4446 closed 3 years ago

matt4446 commented 3 years ago

I'm not sure how I got here after quite a lot of changes to several workflows over the last week... After publishing one it started to error when that workflow attempted to run. Changing a bit of code and starting up iisexpress again killed its startup as well.

But failing to startup probably isn't the desired response. (I'm just hunting down what data is causing this is now)

System.ArgumentException: An item with the same key has already been added. Key: b8aafcc8-7429-4041-8d7d-7ffd45f45a53
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
   at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer)
   at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector)
   at Elsa.Services.WorkflowBlueprintMaterializer.CreateWorkflowBlueprintAsync(WorkflowDefinition workflowDefinition, CancellationToken cancellationToken)
   at Elsa.WorkflowProviders.DatabaseWorkflowProvider.<>c__DisplayClass3_0.<<OnGetWorkflowsAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Elsa.WorkflowProviders.DatabaseWorkflowProvider.OnGetWorkflowsAsync(CancellationToken cancellationToken)
   at Elsa.Services.WorkflowProvider.GetWorkflowsAsync(CancellationToken cancellationToken)+MoveNext()
   at Elsa.Services.WorkflowProvider.GetWorkflowsAsync(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Elsa.Services.WorkflowRegistry.GetWorkflowsInternalAsync(CancellationToken cancellationToken)+MoveNext()
   at Elsa.Services.WorkflowRegistry.GetWorkflowsInternalAsync(CancellationToken cancellationToken)+MoveNext()
   at Elsa.Services.WorkflowRegistry.GetWorkflowsInternalAsync(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at System.Linq.AsyncEnumerable.<ToListAsync>g__Core|620_0[TSource](IAsyncEnumerable`1 source, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/ToList.cs:line 36
   at System.Linq.AsyncEnumerable.<ToListAsync>g__Core|620_0[TSource](IAsyncEnumerable`1 source, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/ToList.cs:line 36
   at Elsa.Services.WorkflowRegistry.ListAsync(CancellationToken cancellationToken)
   at Elsa.Services.WorkflowRegistry.ListActiveWorkflowsAsync(CancellationToken cancellationToken)+MoveNext()
   at Elsa.Services.WorkflowRegistry.ListActiveWorkflowsAsync(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at System.Linq.AsyncEnumerable.<ToListAsync>g__Core|620_0[TSource](IAsyncEnumerable`1 source, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/ToList.cs:line 36
   at System.Linq.AsyncEnumerable.<ToListAsync>g__Core|620_0[TSource](IAsyncEnumerable`1 source, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/ToList.cs:line 36
   at Elsa.Services.WorkflowRegistry.ListActiveAsync(CancellationToken cancellationToken)
   at Elsa.Triggers.TriggerIndexer.IndexTriggersAsync(CancellationToken cancellationToken)
   at Elsa.StartupTasks.IndexTriggers.ExecuteAsync(CancellationToken cancellationToken)
   at Elsa.Runtime.StartupRunner.StartupAsync(CancellationToken cancellationToken)
   at Elsa.HostedServices.StartupRunnerHostedService.StartAsync(CancellationToken cancellationToken)

Similar things as start workflow based on my logs:

05/12/2021 15:19:26 +01:00 Unhandled exception rendering component: "An item with the same key has already been added. Key: b8aafcc8-7429-4041-8d7d-7ffd45f45a53"
System.ArgumentException: An item with the same key has already been added. Key: b8aafcc8-7429-4041-8d7d-7ffd45f45a53
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer)
   at Elsa.Services.WorkflowBlueprintMaterializer.CreateWorkflowBlueprintAsync(WorkflowDefinition workflowDefinition, CancellationToken cancellationToken)
   at Elsa.WorkflowProviders.DatabaseWorkflowProvider.<>c__DisplayClass3_0.<<OnGetWorkflowsAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Elsa.WorkflowProviders.DatabaseWorkflowProvider.OnGetWorkflowsAsync(CancellationToken cancellationToken)
   at Elsa.Services.WorkflowProvider.GetWorkflowsAsync(CancellationToken cancellationToken)+MoveNext()
   at Elsa.Services.WorkflowProvider.GetWorkflowsAsync(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at Elsa.Services.WorkflowRegistry.GetWorkflowsInternalAsync(CancellationToken cancellationToken)+MoveNext()
   at Elsa.Services.WorkflowRegistry.GetWorkflowsInternalAsync(CancellationToken cancellationToken)+MoveNext()
   at Elsa.Services.WorkflowRegistry.GetWorkflowsInternalAsync(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at System.Linq.AsyncEnumerable.<ToListAsync>g__Core|620_0[TSource](IAsyncEnumerable`1 source, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/ToList.cs:line 36
   at System.Linq.AsyncEnumerable.<ToListAsync>g__Core|620_0[TSource](IAsyncEnumerable`1 source, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/ToList.cs:line 36
   at Elsa.Services.WorkflowRegistry.ListAsync(CancellationToken cancellationToken)
   at Open.Linq.AsyncExtensions.Extensions.ToList[TSource](Task`1 source)
   at Elsa.Decorators.CachingWorkflowRegistry.<>c__DisplayClass10_0.<<GetWorkflowBlueprints>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Extensions.Caching.Memory.CacheExtensions.GetOrCreateAsync[TItem](IMemoryCache cache, Object key, Func`2 factory)
   at Elsa.Decorators.CachingWorkflowRegistry.GetWorkflowBlueprints(CancellationToken cancellationToken)
   at Elsa.Decorators.CachingWorkflowRegistry.FindAsync(Func`2 predicate, CancellationToken cancellationToken)
   at Elsa.Decorators.CachingWorkflowRegistry.GetAsync(String id, String tenantId, VersionOptions version, CancellationToken cancellationToken)
   ....
sfmskywalker commented 3 years ago

I think I know. Check your database and look for workflow definitions that have the same definition ID AND version.

Delete all duplicates and restart the app.

Somehow, on occasion publishing a workflow causes multiple records to be created with the same version. Haven’t been able to figure that one out yet.

matt4446 commented 3 years ago

nothing obvious in the index: image Nthing too obvious in the definition store image

The id it is mentioning appears to be an activity image

matt4446 commented 3 years ago

Here we are...: looks like I gave it a description last image

sfmskywalker commented 3 years ago

Looking at the definitions table, look at the various records containing documents with the same definition ID and see if there are any documents with the same Version number. Perhaps you already checked, but I can’t see it from your screenshots.

matt4446 commented 3 years ago

ah yes. [Version] is a yessql thing. They are still unique. image I'll fix the JSON in the latest and hope it goes away

matt4446 commented 3 years ago

A few levels of broken later... The duplicate activity JSON in the workflow was the primary problem.

three versions of that workflow ended up with the same duplicate in the JSON, which then required the correction on all three. The last ended up a bit more broken but that may have been me hitting it with a spade. All three needed correcting before it would start up again.

sfmskywalker commented 3 years ago

Interesting. So it wasn’t the issue I thought, but within workflow definitions itself having duplicate activity IDs.

Let’s see how often this happens. In any case, an easier fix next time might be to simply delete the older versions and only fix the published or latest version, just to save some work.

For your sake I hope it doesn’t happen more often, but for my sake: please keep me posted when it does happen 🙏🏻

matt4446 commented 3 years ago

Oh Probably :D I was debating just rolling it back some versions instead, but while fixing the JSON I didn't anticipate it would be checking all of the workflows and all of their activities It was quite quick on the other two when I got through the various errors of the latest version and it came back with the original error again. Lol

I have a feeling I know how I caused it... let me see if I can do it again Not stating up is a problem. I can live problem workflows

sfmskywalker commented 3 years ago

If you can repro then that would be sweet. The server not starting due to problematic workflows is something that needs fixing regardless 👍🏻

matt4446 commented 3 years ago

Unfortunately, it didnt reproduce (which I guess is a good thing). It all went downhill from a long chain with an if/else in the middle, which I changed to a switch activity. While moving the items back onto similar routes it doesn't seem to always pay attention (shift-click) / sometimes freezes, which was more apparent after a while of using it.
I'll keep an eye open for it though.

matt4446 commented 3 years ago

It happened again but I wasn't paying much attention. Duplicated Id on the foreach activity this time (but it was a renamed activity like yesterdays trouble maker (name and display name change)). correctioin: there different activities duplicated (changed name and display name again)

matt4446 commented 3 years ago

image ^ Renaming seems to be the pattern I did try adding a fork just above that though (which seemed to fail - its no where to be seen)

There are a few errors after reopening the workflow: image

matt4446 commented 3 years ago

Updated from RC .61 -> .65

Reproducible (possibly less steps but seems to work):

Add Javascript activity Add Foreach activity Add Fork between Javascript and foreach activity (outtcoms A, B) Reconnect foreach activity for outcome A (it will have become disconnected and back at the start) Rename foreach activity

Check the saved workflow in network traffic image

matt4446 commented 3 years ago

Actually, it's much more simple (no content for any of these):

  1. Add JavaScript activity (save)
  2. Add ForeEach to JS activity (save)
  3. Edit foreach (change name and displayname, save)
  4. Check network traffic. image
sfmskywalker commented 3 years ago

Brilliant. In fact, it's even better (for me that is): all I have to do is open an activity editor and click save. Every time I do this, a duplicate activity gets added. Must have introduced this issue yesterday or so when preventing activities from being added as soon as you select one from the activity picker.

Fixing it now.

sfmskywalker commented 3 years ago

I just pushed a fix, curious to see if you can still reproduce after pulling latest (source or MyGet packages which are currently being built).

sfmskywalker commented 3 years ago

That update should also prevent the app from starting even when there are troublesome workflows in the system.

matt4446 commented 3 years ago

Perfect, I'll test it shortly.

matt4446 commented 3 years ago

Starting up - a success

I still have some broken data from previous attempts apparently, which blocks the running of other workflows (designer or coded) by the way such as:

var execute = Builder
                .StartWith<RunCodeActivity>(setup: (setup) =>
                {
                    setup.Set(x => x.Name, x=> id);
                    setup.Set(x => x.PersistWorkflow, x => true);
                    setup.Set(x => x.SaveWorkflowContext, x => true);
                    setup.Set(x => x.LoadWorkflowContext, x => true);
                    setup.Set(x => x.Id, y => id);
                    setup.Set(x => x.ScriptName, y => this.RunCode.ScriptName);
                })
                .PersistWorkflow(true)
                .WithDisplayName(id)
                //.LoadWorkflowContext(true)
                //.SaveWorkflowContext(true)
                //.Finish(x=> x.with)
                .WithName(id)
                .Build();

            var expConverter = new ExpandoObjectConverter();

            var model = JsonConvert.DeserializeObject<ExpandoObject>(input ?? "{}", expConverter);

            WorkflowInstance result = await this.WorkflowRunner.StartWorkflowAsync(execute, input: model);

But at least I know where to look now: image

But I should be all good once I fix the data (or delete it).

The designer is looking different. I'll try cleaning and rebuilding to see if that fixes it. image

sfmskywalker commented 3 years ago

The designer is looking different.

That's one way to put it I guess 😅

All TW classes are now prefixed with elsa- to avoid potential collisions with "outer" css classes. Probably a hard-refresh will do the trick.

matt4446 commented 3 years ago

Alive again It helps if i have the right css files (previously using some from the blazer project) :)

    <link rel="stylesheet" href="/_content/Elsa.Designer.Components.Web/elsa-workflows-studio/assets/fonts/inter/inter.css">
    <link rel="stylesheet" href="/_content/Elsa.Designer.Components.Web/elsa-workflows-studio/assets/styles/tailwind.css">

todo: fix my data and try it all out again.

matt4446 commented 3 years ago

The duplicate data that you were expecting has now appeared: image

which probably relates to this error:

05/13/2021 12:42:08 +01:00 Error in Workflow
System.ArgumentException: An item with the same key has already been added. Key: (feafe15d700148c98422ef324a1c3ca3, 7)

...has a few versions for 8, 7 etc for the same definition id now: image

I might just delete that workflow and hope it goes away (after a pizza)

sfmskywalker commented 3 years ago

Yep that looks familiar. If you remove all but the latest & published versions, it should work again. But I'm still hoping to reproduce this issue in the first place (getting multiple same versions in the DB).

One thing I might want to do regardless is include a unique key constraint on DefinitionId + Version.

matt4446 commented 3 years ago

That's only happened once for me (but a lot for the same instance). I tried recreating it on a new workflow but failed so far. I'll keep an eye out for it.

Shall we close this for now and reopen a new one if that particular record (not activity) duplication part happens again.

The only other problem with the invalid workflow/activity that I came across is that it stops all workflows from being able to run., which hopefully using the code builder or definition, but shouldn't be too much of a problem if the broken parts don't get there to begin with.

sfmskywalker commented 3 years ago

Sounds good, let's close this. I created a separate issue to work on the unique key constraints.