aws / aws-xray-sdk-dotnet

The official AWS X-Ray SDK for .NET.
Apache License 2.0
114 stars 64 forks source link

Race condition - intermittent exceptions "Operations that change non-concurrent collections must have exclusive access" #282

Closed udlose closed 5 months ago

udlose commented 1 year ago

Hi,

I was experiencing the same race condition as explained in #67 and have a small project that has been able to repro the issue (of course intermittently since it's a race condition). I have posted the sample solution here: https://github.com/udlose/AWSXRayRaceCondition

AWSXRayRaceConditionTests.TelemetryRecorderTests.Trace CallsActionAndRecords WhenTracingIsEnabled
   Source: TelemetryRecorderTests.cs line 82
   Duration: 1 ms

  Message: 
System.InvalidOperationException : Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.

  Stack Trace: 
Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
Dictionary`2.set_Item(TKey key, TValue value)
JsonMapper.RegisterExporter[T](ExporterFunc`1 exporter) line 922
JsonSegmentMarshaller.ctor() line 48
UdpSegmentEmitter.ctor() line 45
AWSXRayRecorder.ctor() line 47
TelemetryRecorderTests.ctor() line 25
RuntimeType.CreateInstanceDefaultCtor(Boolean publicOnly, Boolean wrapExceptions)
udlose commented 1 year ago

It appears the root cause of the race is in LitJson which appears to be used by AWSXRayRecorder. I've opened a PR with LitJson project. You would obviously need to consume the latest version once they accept the PR and release the fix.

jj22ee commented 1 year ago

Hi @udlose,

It looks like your PR to LitJson hasn't gotten any traction. We've faced a similar issue with this dependency before and that is why we have the source code of LitJson directly in this repository (aws-xray-sdk-dotnet/sdk/src/Core/ThirdParty/LitJson/JsonMapper.cs).

Can you make your PR changes to LitJson to aws-xray-sdk-dotnet/sdk/src/Core/ThirdParty/LitJson/ as well, so you do not need to further wait for the maintainers of LitJson to approve your PR? Also can you apply the changes without the formatting changes to make the review easier?

Compufreak345 commented 1 year ago

Hi @udlose , we are facing the same issue here, so I would like to second the ask from @jj22ee .

You would help us a ton if you could do that :)

Thanks & best regards, Christoph

udlose commented 1 year ago

I've updated my PR to LitJson (https://github.com/LitJSON/litjson/pull/142) and will re-create a duplicate PR against this repo tonight.

udlose commented 1 year ago

@Compufreak345 , @jj22ee PR submitted. I noticed there was some code in the latest JsonMapper.cs that wasn't included here so I left that in the PR. Feel free to remove if desired.

jj22ee commented 1 year ago

Thanks for submitting the PR. I see that most of the PR Checks are failing due to a ThirdParty.LitJson.JsonException. The Unit Tests are failing due to a recursive (infinite?) loop in the the WriteValue() method. Added a comment in the PR.

udlose commented 9 months ago

Hi @udlose,

It looks like your PR to LitJson hasn't gotten any traction. We've faced a similar issue with this dependency before and that is why we have the source code of LitJson directly in this repository (aws-xray-sdk-dotnet/sdk/src/Core/ThirdParty/LitJson/JsonMapper.cs).

Can you make your PR changes to LitJson to aws-xray-sdk-dotnet/sdk/src/Core/ThirdParty/LitJson/ as well, so you do not need to further wait for the maintainers of LitJson to approve your PR? Also can you apply the changes without the formatting changes to make the review easier?

@Compufreak345 , @jj22ee Just to provide an update here, my pr for the fix in LitJSON (https://github.com/LitJSON/litjson/pull/142) was merged on 11/19/23 - https://github.com/LitJSON/litjson/pull/142#issuecomment-1817955144

If you update the AWS Xray SDK to use the latest version of LitJSON v0.19, it should fix this issue.

jj22ee commented 5 months ago

Thanks @udlose. We cannot directly depend on LitJson as there is some custom code in our own copy of the LitJSON, so we will need to cherry-pick your specific change into our repository. I've updated your PR to do so, and restored some accidentally deleted code.

jj22ee commented 5 months ago

PR is merged. Once we release the next version, I can close this issue.

jj22ee commented 5 months ago

Closing issue since PR is merged, the next release of XRay Dotnet SDK will include this bugfix.