QuantConnect / Lean

Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
https://lean.io
Apache License 2.0
9.62k stars 3.23k forks source link

Collection Modified Exception in Alphas.Analysis.InsightManager.UpdateScores #3155

Closed jaredbroad closed 5 years ago

jaredbroad commented 5 years ago

Expected Behavior

LEAN internals should be thread safe

Actual Behavior

Runtime Error: Collection was modified; enumeration operation may not execute. Stack Trace: System.InvalidOperationException: Collection was modified; enumeration operation may not execute. at System.Collections.Generic.HashSet`1+Enumerator[T].MoveNext () [0x00013] in :0 at QuantConnect.Algorithm.Framework.Alphas.Analysis.InsightManager.UpdateScores (QuantConnect.Algorithm.Framework.Alphas.Analysis.ReadOnlySecurityValuesCollection securityValuesCollection) [0x001b6] in :0 at

Random error

Potential Solution

Review collection for potential collisions.

Reproducing the Problem

Random

System Information

Cloud

Checklist

gsalaz98 commented 5 years ago

I'm sharing my review in hopes that it will be useful for whoever decides to attempt to fix this issue.

There are two possible places the issue can begin:

  1. From QuantConnect.Algorithm.Framework.Alphas.DefaultAlphaHandler.ProcessSynchronousEvents 1a. Which in turn is called twice in QuantConnect.Lean.Engine.Run(...)
  2. From QuantConnect.Algorithm.Framework.Alphas.DefaultAlphaHandler.Initialize(...) where the method is called when there is an InsightsGenerated event.

The UpdateScores method is only called from within the QuantConnect.Algorithm.Framework.Alphas.Analysis.InsightManager.Step(...) method.

Inside the UpdateScores method, we iterate inside the _openInsightContexts private member variable, which in this case ends up being mutated randomly.

Looking deeper, I analyzed every function call with a InsightAnalysisContext being passed as a parameter or calling methods of itself to ensure that the current variable we're iterating wasn't being mutated. The potential offenders in this case can be:

  1. context.SetCurrentValues(...)
  2. context.ShouldAnalyze(...)
  3. function.Evaluate(context, scoreType) where typeof(function) == BinaryInsightScoreFunction
  4. _extensions.ForEach(e => e.OnInsightClosed(context)) where e == ChartingInsightManagerExtension | AlphaResultPacketSender | StatisticsInsightManagerExtension 4a. Because ChartingInsightManagerExtension and AlphaResultPacketSender have nothing performed, only StatisticsInsightManagerExtension is of interest
  5. _extensions.ForEach(e => e.OnInsightAnalysisCompleted(context)) where e == AlphaResultPacketSender | StatisticsInsightManagerExtension (ChartingInsightManagerExtension excluded because no operations take place)
  6. _closedInsightContexts[id] = context
  7. _updatedInsightContexts.Add(context)

Further in depth, I will go one by one explaining why I believe none of these are the root cause of the error, and instead believe the event subscription to be the root cause of this bug.

  1. Calling methods allows mutations to member variables during a foreach loop.
  2. No mutation occurs in this method
  3. No mutation occurs to member variables in the BinaryInsightScoreFunction.Evaluate(...) method
  4. No mutation occurs to member variables of context in StatisticsInsightManagerExtension.OnInsightClosed(...) 5a. AlphaResultPacketSender.OnInsightAnalysisCompleted(...) - member variable Insight is added to enqueue, which in turn gets accessed in the method MessagingUpdateIntervalElapsed and sent through the _messagingHandler, which ultimately does nothing with the AlphaResult packet in the switch statement inside the QuantConnect.Messaging.Messaging class. It is then converted into a string object containing the contents of the packet in that same method. No mutation occurs. 5b. StatisticsInsightManagerExtension.OnInsightAnalysisCompleted(...) - No mutation occurs inside this method; Only getters are used.
  5. Method RemoveInsights is never used; member ClosedInsights selects Insights from context, then is used by member AllInsights and adds ClosedInsights to OpenInsights. This is then referenced in the DefaultAlphaHandler.StoreInsights() method, which creates a copy of the data by the call to ToList(). No mutation occurs
  6. A copy is created of the _updatedInsightContexts, then all of its contents are emptied, and the copy is returned. No usage of the GetUpdatedContexts method exists, thus no mutations`

The only place wherein the collection being iterated over is mutated is during the Step method. Because it is required to not have an empty collection in order to add InsightAnalysisContext to _openInsightContexts, the call to QuantConnect.Algorithm.Framework.Alphas.DefaultAlphaHandler.ProcessSynchronousEvents does not mutate the _openInsightContexts member variable, thus I believe this can't be the cause of the error.

This only leaves the subscription to InsightsGenerated. I'm not very sure how events work in C#, so I don't know if it runs new threads for each event that occurs or some other trickery.

I hope this is useful; I might be wrong on multiple parts of my investigation, so I suggest using this as a guide, and not treat it as fact.

AlexCatarino commented 5 years ago

Also reported at QuantConnect.Lean.Engine.Alphas.ChartingInsightManagerExtension.PopulateChartWithSeriesPerSymbol:


  at System.Collections.Generic.Dictionary`2+Enumerator[TKey,TValue].MoveNext () [0x00013] in <b0e1ad7573a24fd5a9f2af9595e677e7>:0 
  at QuantConnect.Lean.Engine.Alphas.ChartingInsightManagerExtension.PopulateChartWithSeriesPerSymbol (System.Collections.Generic.Dictionary`2[TKey,TValue] data, QuantConnect.Chart chart, QuantConnect.SeriesType seriesType, System.DateTime frontierTimeUtc) [0x00077] in <9eb1a86be1664a0fa619b63251b1ca64>:0 
  at QuantConnect.Lean.Engine.Alphas.ChartingInsightManagerExtension.Step (System.DateTime frontierTimeUtc) [0x00043] in <9eb1a86be1664a0fa619b63251b1ca64>:0 
  at QuantConnect.Algorithm.Framework.Alphas.Analysis.InsightManager.Step (System.DateTime frontierTimeUtc, QuantConnect.Algorithm.Framework.Alphas.Analysis.ReadOnlySecurityValuesCollection securityValuesCollection, QuantConnect.Algorithm.Framework.Alphas.GeneratedInsightsCollection generatedInsights) [0x000f1] in <0c74d61e19d549bd969442e11a4bbb3d>:0 
  at QuantConnect.Lean.Engine.Alphas.DefaultAlphaHandler.ProcessSynchronousEvents () [0x00049] in <9eb1a86be1664a0fa619b63251b1ca64>:0 
  at QuantConnect.Lean.Engine.AlgorithmManager.Run (QuantConnect.Packets.AlgorithmNodePacket job, QuantConnect.Interfaces.IAlgorithm algorithm, QuantConnect.Lean.Engine.DataFeeds.ISynchronizer synchronizer, QuantConnect.Lean.Engine.TransactionHandlers.ITransactionHandler transactions, QuantConnect.Lean.Engine.Results.IResultHandler results, QuantConnect.Lean.Engine.RealTime.IRealTimeHandler realtime, QuantConnect.Lean.Engine.Server.ILeanManager leanManager, QuantConnect.Lean.Engine.Alpha.IAlphaHandler alphas, System.Threading.CancellationToken token) [0x008a7] in <9eb1a86be1664a0fa619b63251b1ca64>:0 
  at QuantConnect.Lean.Engine.Engine+<>c__DisplayClass9_2.<Run>b__9 () [0x000ca] in <9eb1a86be1664a0fa619b63251b1ca64>:0 ```
Martin-Molinero commented 5 years ago

I believe this might be related to the Emiting Insights based on Order Fills PR https://github.com/QuantConnect/Lean/pull/3030, why? because the transaction handler thread could generate a new insight and trigger the InsightsGenerated event asynchronously with the main algorithm thread running the InsightManager.Step()