I've identified a race condition in the Entity.AddException method when subsegments are being added concurrently. The issue arises from the concurrent access and modification of the _lazySubsegments list, which can lead to unpredictable behavior or exceptions during the execution of the DescribeException method within AddException. I am using powertools-lambda-dotnet, but after digging into the logs I found the issue originates from aws-xray-sdk-dotnet.
Steps to Reproduce:
While a definitive reproduction might be challenging due to the non-deterministic nature of race conditions, here's a scenario that increases the likelihood of encountering the issue:
Create an instance of Entity (e.g., a Segment).
In one thread, trigger the AddException method on the Entity instance.
Concurrently, in another thread or multiple threads, repeatedly call AddSubsegment on the same Entity instance to simulate concurrent modifications to the _lazySubsegments list.
Observe if any exceptions, such as InvalidOperationException, are thrown during the execution of AddException or subsequent operations.
I've setup the following code snippet to simulate the error:
var entity = new Segment("TestSegment");
const int EXCEPTION_ITERATIONS = 100;
const int SUBSEGMENT_ITERATIONS = 100_000;
var raceConditionOccurred = false;
string? exceptionMessage = null;
var cancellationTokenSource = new CancellationTokenSource();
// Thread 1: Trigger AddException
var addExceptionTask = Task.Run(() =>
{
Parallel.ForEach(Enumerable.Range(0, EXCEPTION_ITERATIONS), _ =>
{
try
{
var exception = new Exception("Test Exception");
entity.AddException(exception);
}
catch (InvalidOperationException ex)
{
raceConditionOccurred = true;
exceptionMessage = ex.Message;
cancellationTokenSource.Cancel();
}
});
});
// Thread 2: Add Subsegments concurrently using Parallel.ForEach
var addSubsegmentsTask = Task.Run(() =>
{
try
{
Parallel.ForEach(Enumerable.Range(0, SUBSEGMENT_ITERATIONS), new ParallelOptions { CancellationToken = cancellationTokenSource.Token }, _ =>
{
entity.AddSubsegment(new Subsegment("TestSubsegment"));
});
}
catch (OperationCanceledException) { }
});
Task.WaitAll(addExceptionTask, addSubsegmentsTask);
if (raceConditionOccurred)
{
_output.WriteLine($"Race condition error occurred: {exceptionMessage}");
}
Assert.True(raceConditionOccurred, "Expected a race condition error but none occurred.");
Expected Behavior:
The AddException method should execute correctly even when subsegments are being added concurrently, without throwing any exceptions or causing data inconsistencies.
Suggested Fix:
Add a lock to ensure thread-safe access to the _lazySubsegments list during the execution of Entity.AddException(Exception e).
/// <summary>
/// Adds the exception to cause and set this segment to has fault.
/// </summary>
/// <param name="e">The exception to be added.</param>
public void AddException(Exception e)
{
HasFault = true;
Cause = new Cause();
List<Subsegment> subsegmentsCopy;
lock (_lazySubsegments.Value)
{
subsegmentsCopy = [.. _lazySubsegments.Value]; // Create a copy to avoid holding the lock during serialization
}
Cause.AddException(AWSXRayRecorder.Instance.ExceptionSerializationStrategy.DescribeException(e, subsegmentsCopy));
}
I've identified a race condition in the
Entity.AddException
method when subsegments are being added concurrently. The issue arises from the concurrent access and modification of the_lazySubsegments
list, which can lead to unpredictable behavior or exceptions during the execution of theDescribeException
method withinAddException
. I am using powertools-lambda-dotnet, but after digging into the logs I found the issue originates from aws-xray-sdk-dotnet.Steps to Reproduce:
While a definitive reproduction might be challenging due to the non-deterministic nature of race conditions, here's a scenario that increases the likelihood of encountering the issue:
Entity
(e.g., aSegment
).AddException
method on theEntity
instance.AddSubsegment
on the sameEntity
instance to simulate concurrent modifications to the_lazySubsegments
list.InvalidOperationException
, are thrown during the execution ofAddException
or subsequent operations.I've setup the following code snippet to simulate the error:
Expected Behavior:
The
AddException
method should execute correctly even when subsegments are being added concurrently, without throwing any exceptions or causing data inconsistencies.Suggested Fix:
Add a lock to ensure thread-safe access to the
_lazySubsegments
list during the execution ofEntity.AddException(Exception e)
.