SDKits / ExamineX

Issue tracker for ExamineX
https://examinex.online
5 stars 0 forks source link

Collection was modified; enumeration operation may not execute #53

Closed ahertzman closed 2 years ago

ahertzman commented 2 years ago

We are encountering an ExmineX exception that’s quite hard to debug for us. The exception below is frequently present in the log viewer. For our customer this error results in content not being pushed to the Azure Search Index. The only way to resolve the issue is to recycle the app pool.

Is there anything you can think of that can help us debug this issue?

Observations The problem is taking place on the main server within a load balanced setup. There are multiple users actively publishing content from the backoffice. We are manipulating the ValueSet on the TransformingIndexValues event by adding key/values after removing them.

When analyzing the logs, the following events take place before the exception occurs:

Architecture Our project is on a load balanced server environment. The problem occurs on the master/primary instance. We are on a mixed index setup where we have multiple custom Azure Search indexes, but keeping default Umbraco indexes on the local file system.

Version info Umbraco 8.8.2 Examine 1.2.1 ExamineX 1.3.0

Stacktrace

Message: "[Umbraco.Web.Scheduling.IBackgroundTask] " Task has failed
Stacktrace:
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Enumerator.MoveNext()
   at System.Linq.Enumerable.WhereEnumerableIterator`1.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Umbraco.Examine.ContentValueSetValidator.Validate(ValueSet valueSet)
   at Examine.Providers.BaseIndexProvider.<IndexItems>b__12_0(ValueSet x)
   at System.Linq.Enumerable.WhereListIterator`1.MoveNext()
   at System.Linq.Lookup`2.Create[TSource](IEnumerable`1 source, Func`2 keySelector, Func`2 elementSelector, IEqualityComparer`1 comparer)
   at System.Linq.GroupedEnumerable`3.GetEnumerator()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
   at ExamineX.AzureSearch.Umbraco.UmbracoAzureSearchContentIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete)
   at Umbraco.Web.Search.ExamineComponent.DeferedReIndexForContent.<>c__DisplayClass5_0.<Execute>b__0()
   at Umbraco.Web.Scheduling.BackgroundTaskRunner`1.<RunAsync>d__40.MoveNext()
Shazwazza commented 2 years ago

Hi Andrew, sorry for the delay here. So far with testing I'm unable to replicate the issue but based on the stack trace it could be that the ContentValueSetValidator is modifying the collection of ValueSets while Examine/ExamineX is enumerating them. There are some subtle fixes in the latest 2.0.2 version released a couple of weeks ago. Can you confirm if this issue still exists with that version?

ahertzman commented 2 years ago

Hi Shannon,

We updated ExamineX to version 2.0.2 and have been running this version for a while now. Although it seems more stable, the exception still occurs. It's hard to reproduce indeed. In our production environment where multiple users publish content in short periods of time, the issue occurs very often. This results in content not being pushed to Azure Search. A simple republish often solves this, but this isn't very user-friendly obviously. A republish can also be the trigger for the problem above. Any ideas how to get more insight in what is happening?

Version info Umbraco 8.16.0 Examine 1.2.1 ExamineX 2.0.2

Shazwazza commented 2 years ago

Thanks for the report, it would then seem like the subtle changes in 2.0.2 didn't solve the issue. I still believe this is the way that Umbraco is parallelizing indexing. For example, a single ValueSet is created and then passed to all indexes to be indexed in parallel. During this time, one of the index's event handlers will modify this ValueSet while others are trying to read from it.

I'll have a look to see if this can be resolved internally within ExamineX as a work around or within the base Examine lib as a more permanent solution.

Shazwazza commented 2 years ago

Yes so based on the above/orig stack trace, that does seem like what is happening. Umbraco passes the same ValueSet to all indexes in parallel and one index could modify that same ValueSet value collection while another is trying to read it.

This will need to be fixed both in ExamineX and Examine and the ValueSet will need to be cloned so it ends up being a different instance.

But better yet: The ValueSet should be made immutable and the Examine IValueSetValidator should be changed to return a new ValueSet if any filtering occurred.

This could be done as part of Examine 3.0. @bergmania - this would be a breaking change for Umbraco 10 but would prevent this problem and prevent having to work around it.

bergmania commented 2 years ago

This could be done as part of Examine 3.0. @bergmania - this would be a breaking change for Umbraco 10 but would prevent this problem and prevent having to work around it.

Such a change should be okay :)

ahertzman commented 2 years ago

Great to hear the problem can be resolved within Examine and Umbraco 10! Would it be possible to also look into a workaround for ExamineX 2.x? Our client is on Umbraco 8.x. which is a LTS version. Upgrading to Umbraco 9/10 will not be feasible this year. We do however want to provide our client with a proper ExamineX experience. Is there anything you can do to bridge the gap? Something we can override or implement ourselves?

Shazwazza commented 2 years ago

Hi @ahertzman yes for sure, I'll get a new release out asap and post back here

ahertzman commented 2 years ago

Great! I'm happy to see progress has been made as well.

Shazwazza commented 2 years ago

@ahertzman all fixed and 2.0.3 is available on Nuget now 🚀

ahertzman commented 2 years ago

Hi @Shazwazza,

We deployed the patched ExamineX package to our servers. Unfortunately we still experience the exception below. Could this be due to the difference in implementation of the workaround? The problem occurs on our load balanced servers and results in pages not being indexed anymore after publishing.

Current version info Umbraco 8.18.3 Examine 1.2.2 ExamineX 2.0.3

Stacktrace

"[Umbraco.Web.Scheduling.IBackgroundTask]" Task has failed
   System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Enumerator.MoveNext()
   at System.Linq.Enumerable.WhereEnumerableIterator`1.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Umbraco.Examine.ContentValueSetValidator.Validate(ValueSet valueSet)
   at Examine.Providers.BaseIndexProvider.<IndexItems>b__12_0(ValueSet x)
   at System.Linq.Enumerable.WhereListIterator`1.MoveNext()
   at System.Linq.Lookup`2.Create[TSource](IEnumerable`1 source, Func`2 keySelector, Func`2 elementSelector, IEqualityComparer`1 comparer)
   at System.Linq.GroupedEnumerable`3.GetEnumerator()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
   at ExamineX.AzureSearch.Umbraco.UmbracoAzureSearchContentIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete)
   at Umbraco.Web.Search.ExamineComponent.DeferedReIndexForContent.<>c__DisplayClass5_0.<Execute>b__0()
   at Umbraco.Web.Scheduling.BackgroundTaskRunner`1.<RunAsync>d__40.MoveNext()
Shazwazza commented 2 years ago

@ahertzman the underlying issue is a bit tricky to work around. The previous attempt was 'too late' in the execution pipeline. What is happening is that the value set validation is occurring at a high level and this validation is what is causing the concurrency issue since the validation is modifying the index set.

I've determined that I can override the IIndex.IndexItems method by explicitly implementing that and based on the code that is calling this in Umbraco the index type will be casted to IUmbracoIndex which in turn means it will use the explicitly implemented IIndex.IndexItems in my recent change. In that method the ValueSet collection is cloned and then passed along as per normal.

This should in theory fix the issue.

I've pushed version 2.0.5 up to Nuget. If you can please let me know if that resolves the issue that would be great.

ahertzman commented 2 years ago

Thanks for the effort, Shannon! I can imagine the workaround is hard to implement. Unfortunately the issue has not been resolved. We noticed the exception occurring more often since the last two ExamineX patches. Triggering the failed background task has become very easy now, making indexing very unreliable. Would patching Umbraco/Examine assemblies be an option? Or any other direction that could help us bridge the gap to the next major upgrade?

Shazwazza commented 2 years ago

Hi @ahertzman, I'm just about to post another version which fixes another issue and also re-fixes this one too. The explicit index implementation to override Umbraco's needed to be done in two places not just one. I know this is frustrating! I've stepped through the code in this new change and it is definitely cloning the value sets for each index so we should no longer get multiple threads modifying the same value set.

The new versions: 2.0.6 and 3.0.3 should be out in the next few hours.

But yes, ideally, this would be fixed within Umbraco itself.

ahertzman commented 2 years ago

Thanks again for the update! We just tested the new release. We see some overall improvement and the original stacktrace has gone away. It now seems to break on another line though. I guess this is where the cloning takes place.

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Enumerator.MoveNext()
   at System.Linq.Enumerable.ToDictionary[TSource,TKey,TElement](IEnumerable`1 source, Func`2 keySelector, Func`2 elementSelector, IEqualityComparer`1 comparer)
   at ExamineX.AzureSearch.Umbraco.UmbracoContentAzureSearchIndex.Examine.IIndex.IndexItems(IEnumerable`1 )
   at Umbraco.Web.Search.ExamineComponent.DeferedReIndexForContent.<>c__DisplayClass5_0.<Execute>b__0()
   at Umbraco.Web.Scheduling.BackgroundTaskRunner`1.<RunAsync>d__40.MoveNext()
Shazwazza commented 2 years ago

Thanks for the quick feedback @ahertzman ! OK so the next step is to put a lock around the cloning since it's clear that Umbraco is passing this ValueSet to all indexes in parallel. I'll ship a new release shortly and let you know when it's available.

Shazwazza commented 2 years ago

@ahertzman 2.0.7 has been published https://github.com/SDKits/ExamineX/releases/tag/v2.0.7 The cloning is now done with a lock which should prevent enumeration of the ValueSet while it is being copied. Let me know if this resolves your issue, else I'll have to see what can be done with either an Umbraco or Examine patch.

ahertzman commented 2 years ago

@Shazwazza I just tested the new release. Still no luck with 2.07 unfortunately.

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Enumerator.MoveNext()
   at System.Linq.Enumerable.ToDictionary[TSource,TKey,TElement](IEnumerable`1 source, Func`2 keySelector, Func`2 elementSelector, IEqualityComparer`1 comparer)
   at ExamineX.AzureSearch.ValueSetExtensions.CloneWithLock(IEnumerable`1 values)
   at ExamineX.AzureSearch.Umbraco.UmbracoContentAzureSearchIndex.Examine.IIndex.IndexItems(IEnumerable`1 )
   at Umbraco.Web.Search.ExamineComponent.DeferedReIndexForContent.<>c__DisplayClass5_0.<Execute>b__0()
   at Umbraco.Web.Scheduling.BackgroundTaskRunner`1.<RunAsync>d__40.MoveNext()
Shazwazza commented 2 years ago

Thanks for the report @ahertzman. I'm trying to determine how this is possible but I'm not understanding how the dictionary of a ValueSet can be modified while we're iterating it. I cannot replicate this issue in a test even with 100 concurrent threads.

This is how the IEnumerable is created by Umbraco that is passed into the indexes https://github.com/umbraco/Umbraco-CMS/blob/087e682f7956fa30e325fd3897ccb4953053c936/src/Umbraco.Examine/ContentValueSetBuilder.cs#L66

The new CloneWithLock method in ExamineX takes this resulting IEnumerable<ValueSet>, does a ToList() on it thus resolving the enumerable and then proceeds to clone each ValueSet manually within the resolved list of ValueSets. The exception is being thrown during the cloning of the property values (dictionary) within the ValueSet.

Prior to this change the reason the collection was modified was because the umbraco content validator and/or an TransformingIndexValues event was modifying the collection while it was being passed to other indexes.

With this change now, each IEnumerable is cloned before being passed to the underlying Umbraco index which then performs the validation. This is how this would be fixed within Umbraco and ExamineX here is overriding the default Umbraco implementation before passing the new cloned ValueSet to the underlying Umbraco implementation but it seems that somehow this fix is still not working.

Maybe @bergmania might have any ideas on how this exception could be possible? The stack trace above is pretty straightforward now

ahertzman commented 2 years ago

Hi @Shazwazza. Thanks for going into so much detail. The strange thing is that the exception now occurs on almost every publish. We don't even have to try hard to replicate it.

We have been trying to work around the issue ourselves as well. While we are not able to resolve the issue completely, we did manage to greatly improve it. Going back to the original stacktrace (and ExamineX 2.0.5) we implemented a copy of Umbraco.Examine.ContentValueSetValidator, overriding the default Validate() behaviour. Basically we're just doing a ToList() to resolve the valueset values before proceeding.

public override ValueSetValidationResult Validate(ValueSet valueSet)
{
    // Create a copy of the ValueSet's Values.
    var values = new Dictionary<string, List<object>>();
    foreach (var item in valueSet.Values.ToList())
    {
        values.Add(item.Key, item.Value);
    }
    ...<original code>
}

Obviously this isn't ideal, because the success rate is still below 100% and we have to maintain this override on every Umbraco patch/upgrade. Is it an option to try and fix it permanently within Umbraco/Examine? We would be happy to take any debug assemblies to try out the Umbraco fix if that helps.

Shazwazza commented 2 years ago

Thanks @ahertzman for the assistance with this. Do you have a TransformingIndexValues event handler configured and if so can you share with me what it is doing?

I might suggest that in your Validate(ValueSet valueSet) method that you wrap the whole thing in a lock(locker) {.... } where locker is a private static readonly object locker = new object() to see if that resolves the issue.

I unfortunately cannot replicate this issue either in unit tests or in the Umbraco application. This is the code being used in ExamineX for cloning the ValueSet

public static IEnumerable<ValueSet> CloneWithLock(this IEnumerable<ValueSet> values)
{
    lock (locker)
    {
        var valueSets = values.ToList();
        for (var i = 0; i < valueSets.Count; i++)
        {
            var valueSet = valueSets[i];

            var clonedValueSet = new ValueSet(
                valueSet.Id,
                valueSet.Category,
                valueSet.ItemType,
                valueSet.Values.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value.ToList()));

            // replace at the index
            valueSets[i] = clonedValueSet;
        }

        return valueSets;
    }
}

This is used in an explicit override so that the ValueSet is cloned before being passed to the base index class (which I can confirm is executing):

        void IIndex.IndexItems(IEnumerable<ValueSet> values)
        {
            // clone the value set, else it could get manipulated by another index if
            // things are being indexed in parallel.            
            base.IndexItems(values.CloneWithLock());
        }

The base index class then does this where ValidateItem is called:

        public void IndexItems(IEnumerable<ValueSet> values)
        {
            PerformIndexItems(values.Where((ValueSet x) => ValidateItem(x) != ValueSetValidationResult.Failed), OnIndexOperationComplete);
        }

So you can see that the ValueSet being passed to IndexItems in ExamineX is being cloned before being passed to ValidateItem so I cannot how any collection in the ValueSets can be modified while iterating it since the only place it's being modified is within ValidateItem and TransformingIndexValues but both of these will now receive a cloned ValueSet instead of the original.

If you have any additional information on how to replicate this please let me know :)

ahertzman commented 2 years ago

Hi @Shazwazza,

We do have a TransformingIndexValues event handler configured, but I disabled it. It doesn't seem to make a difference without it. Good point though. We'll look into adding a lock as well, to see if that further improves the workaround.

We have some new insights that might help you replicate it. We are creating an UmbracoAzureSearchContentIndex instance and pass a ContentValueSetValidator to the constructor with publishedValuesOnly set to true. However, when we set publishedValuesOnly to false, the problem doesn't occur anymore. Any ideas how this would be possible?

We are unable to replicate the problem on the default internal and external Umbraco indexes.

Shazwazza commented 2 years ago

@ahertzman that is interesting. Would you be able to share how you are creating this extra UmbracoAzureSearchContentIndex? Also, is there a specific reason why you need an extra indexer that the default external indexer can't support?