Closed jaredbroad closed 5 years ago
I'm sharing my review in hopes that it will be useful for whoever decides to attempt to fix this issue.
There are two possible places the issue can begin:
QuantConnect.Algorithm.Framework.Alphas.DefaultAlphaHandler.ProcessSynchronousEvents
1a. Which in turn is called twice in QuantConnect.Lean.Engine.Run(...)
QuantConnect.Algorithm.Framework.Alphas.DefaultAlphaHandler.Initialize(...)
where the method is called when there is an InsightsGenerated
event. The UpdateScores
method is only called from within the QuantConnect.Algorithm.Framework.Alphas.Analysis.InsightManager.Step(...)
method.
Inside the UpdateScores
method, we iterate inside the _openInsightContexts
private member variable, which in this case ends up being mutated randomly.
Looking deeper, I analyzed every function call with a InsightAnalysisContext
being passed as a parameter or calling methods of itself to ensure that the current variable we're iterating wasn't being mutated. The potential offenders in this case can be:
context.SetCurrentValues(...)
context.ShouldAnalyze(...)
function.Evaluate(context, scoreType)
where typeof(function) == BinaryInsightScoreFunction
_extensions.ForEach(e => e.OnInsightClosed(context))
where e == ChartingInsightManagerExtension | AlphaResultPacketSender | StatisticsInsightManagerExtension
4a. Because ChartingInsightManagerExtension
and AlphaResultPacketSender
have nothing performed, only StatisticsInsightManagerExtension
is of interest_extensions.ForEach(e => e.OnInsightAnalysisCompleted(context))
where e == AlphaResultPacketSender | StatisticsInsightManagerExtension
(ChartingInsightManagerExtension
excluded because no operations take place)_closedInsightContexts[id] = context
_updatedInsightContexts.Add(context)
Further in depth, I will go one by one explaining why I believe none of these are the root cause of the error, and instead believe the event subscription to be the root cause of this bug.
BinaryInsightScoreFunction.Evaluate(...)
methodcontext
in StatisticsInsightManagerExtension.OnInsightClosed(...)
5a. AlphaResultPacketSender.OnInsightAnalysisCompleted(...)
- member variable Insight
is added to enqueue, which in turn gets accessed in the method MessagingUpdateIntervalElapsed
and sent through the _messagingHandler
, which ultimately does nothing with the AlphaResult
packet in the switch statement inside the QuantConnect.Messaging.Messaging
class. It is then converted into a string object containing the contents of the packet in that same method. No mutation occurs.
5b. StatisticsInsightManagerExtension.OnInsightAnalysisCompleted(...)
- No mutation occurs inside this method; Only getters are used.RemoveInsights
is never used; member ClosedInsights
selects Insights
from context
, then is used by member AllInsights
and adds ClosedInsights
to OpenInsights
. This is then referenced in the DefaultAlphaHandler.StoreInsights()
method, which creates a copy of the data by the call to ToList()
. No mutation occurs_updatedInsightContexts
, then all of its contents are emptied, and the copy is returned. No usage of the GetUpdatedContexts
method exists, thus no mutations`The only place wherein the collection being iterated over is mutated is during the Step
method. Because it is required to not have an empty collection in order to add InsightAnalysisContext
to _openInsightContexts
, the call to QuantConnect.Algorithm.Framework.Alphas.DefaultAlphaHandler.ProcessSynchronousEvents
does not mutate the _openInsightContexts
member variable, thus I believe this can't be the cause of the error.
This only leaves the subscription to InsightsGenerated
. I'm not very sure how events work in C#, so I don't know if it runs new threads for each event that occurs or some other trickery.
I hope this is useful; I might be wrong on multiple parts of my investigation, so I suggest using this as a guide, and not treat it as fact.
Also reported at QuantConnect.Lean.Engine.Alphas.ChartingInsightManagerExtension.PopulateChartWithSeriesPerSymbol
:
at System.Collections.Generic.Dictionary`2+Enumerator[TKey,TValue].MoveNext () [0x00013] in <b0e1ad7573a24fd5a9f2af9595e677e7>:0
at QuantConnect.Lean.Engine.Alphas.ChartingInsightManagerExtension.PopulateChartWithSeriesPerSymbol (System.Collections.Generic.Dictionary`2[TKey,TValue] data, QuantConnect.Chart chart, QuantConnect.SeriesType seriesType, System.DateTime frontierTimeUtc) [0x00077] in <9eb1a86be1664a0fa619b63251b1ca64>:0
at QuantConnect.Lean.Engine.Alphas.ChartingInsightManagerExtension.Step (System.DateTime frontierTimeUtc) [0x00043] in <9eb1a86be1664a0fa619b63251b1ca64>:0
at QuantConnect.Algorithm.Framework.Alphas.Analysis.InsightManager.Step (System.DateTime frontierTimeUtc, QuantConnect.Algorithm.Framework.Alphas.Analysis.ReadOnlySecurityValuesCollection securityValuesCollection, QuantConnect.Algorithm.Framework.Alphas.GeneratedInsightsCollection generatedInsights) [0x000f1] in <0c74d61e19d549bd969442e11a4bbb3d>:0
at QuantConnect.Lean.Engine.Alphas.DefaultAlphaHandler.ProcessSynchronousEvents () [0x00049] in <9eb1a86be1664a0fa619b63251b1ca64>:0
at QuantConnect.Lean.Engine.AlgorithmManager.Run (QuantConnect.Packets.AlgorithmNodePacket job, QuantConnect.Interfaces.IAlgorithm algorithm, QuantConnect.Lean.Engine.DataFeeds.ISynchronizer synchronizer, QuantConnect.Lean.Engine.TransactionHandlers.ITransactionHandler transactions, QuantConnect.Lean.Engine.Results.IResultHandler results, QuantConnect.Lean.Engine.RealTime.IRealTimeHandler realtime, QuantConnect.Lean.Engine.Server.ILeanManager leanManager, QuantConnect.Lean.Engine.Alpha.IAlphaHandler alphas, System.Threading.CancellationToken token) [0x008a7] in <9eb1a86be1664a0fa619b63251b1ca64>:0
at QuantConnect.Lean.Engine.Engine+<>c__DisplayClass9_2.<Run>b__9 () [0x000ca] in <9eb1a86be1664a0fa619b63251b1ca64>:0 ```
I believe this might be related to the Emiting Insights based on Order Fills
PR https://github.com/QuantConnect/Lean/pull/3030, why? because the transaction handler thread could generate a new insight and trigger the InsightsGenerated
event asynchronously with the main algorithm thread running the InsightManager.Step()
Expected Behavior
LEAN internals should be thread safe
Actual Behavior
Random error
Potential Solution
Review collection for potential collisions.
Reproducing the Problem
Random
System Information
Cloud
Checklist
master
branch