aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.03k stars 848 forks source link

Thread safety issue with request signing (observed with Unity on iOS) #313

Closed c7james closed 8 years ago

c7james commented 8 years ago

Hello,

Since integrating Mobile Analytics alongside a number of other services, I've encountered a timing-sensitive crash in code called by the AWS SDK. I've only observed it when running with Unity deployed to iOS running on device, but it's possible it happens in other configurations as well. Note that I saw this on the old Unity3D fork of the AWS SDK, and have now reproduced it on the newly Unity3D-capable .NET SDK here.

The crash, when it occurs, appears to be caused by a static initialization race between two threads, both of which are initiating signing requests (Amazon.Runtime.Internal.Signer.InvokeAsync). This triggers a fairly deep callstack, eventually ending up in System.Security.Cryptography.RandomNumberGenerator Mono.Security.Cryptography.KeyBuilder::get_Rng(). The il2cpp version of this function uses a static bool to flag/trigger initialization of the matched TypeInfo. However, if multiple threads call this function at roughly the same time, they can interfere with each other.

The common pattern of interference I've seen is:

Thread 1 makes the first call, which enters the initialization routine. Before Thread 1 finishes said initialization and sets the static initialization flag to true, Thread 2 calls the same function. Since the flag is still false, it also enters the initialization routine. Thread 1 finishes its run of the initialization code, and moves on to use the newly initialized data. Thread 2 starts to re-initialize the data.
Thread 1 accesses the data half-way through Thread 2's re-initialization, meaning the data may be in an invalid state. This usually manifests as the TypeInfo var having been allocated, but the "static_fields" element not yet having been initialized (i.e. it's still null). Thread 1 gets EXC_BAD_ACCESS trying to use the null field.

Long story short -- it looks like the il2cpp version of System.Security.Cryptography.RandomNumberGenerator isn't threadsafe at initialization time, but the AWS SDK can make overlapped calls to it across threads. I believe adding Mobile Analytics exposed this as it runs its own background thread to deliver events, whereas the rest of the calls to other AWS SDK in my project are all explicitly triggered from my own code, where I currently always make such calls from the same thread and don't overlap calls (though of course, this could be not the case in other threading configurations).

Are there any recommendations around this? Is there a way for me to ensure the first signing request happens before I make other calls? I suppose I can defer initializing Mobile Analytics until I know I've made at least one other successful call, but this seems to just be potentially shifting the problem around to be less likely.

Thanks in advance!

Cheers,

~James

c7james commented 8 years ago

Update:

After adding a raft of debug output, I discovered that I had accidentally left in a call to SNS that was firing on a separate thread, before the rest of my initialization code had completed. By moving this to a safer location (i.e. after initialization), I seem to have eliminated what was a very common crash, or at least made it rare enough that I've not seen it since making the change.

So, the urgency of this has gone way down for me. However, I've only recently added SNS support to our app, and have been intermittently seeing this crash on startup on iOS for a few months now. My leftover SNS call appears to have made the crash much more common, but I don't believe was the sole cause of it. That is, I think there is still a legitimate crash issue in this code, which appears to be very timing dependent.

Since I do have a relatively easily reproducible test case, let me know if I can provide any additional information (callstacks, etc.), and I'd be happy to do so.

Cheers,

~James

johnnyhoffman commented 8 years ago

Hi, I have successfully been able to replicate your issue and am now working on a fix. Thank you for bringing this to our attention, and for all of the insight!

Jabbaquack commented 8 years ago

Hi, I'm using the Amazon Unity SDK for on iOS and it appears that I'm getting the same issue:

// System.Security.Cryptography.RandomNumberGenerator Mono.Security.Cryptography.KeyBuilder::get_Rng() extern TypeInfo* KeyBuilder_t2049706641_il2cpp_TypeInfo_var; extern const uint32_t KeyBuilder_get_Rng_m4045185360_MetadataUsageId; extern "C" RandomNumberGenerator_t2174318432 * KeyBuilder_get_Rng_m4045185360 (Object_t * this /* static, unused /, const MethodInfo method) { static bool s_Il2CppMethodIntialized; if (!s_Il2CppMethodIntialized) { il2cpp_codegen_initialize_method (KeyBuilder_get_Rng_m4045185360_MetadataUsageId); s_Il2CppMethodIntialized = true; } { RandomNumberGenerator_t2174318432 * L_0 = ((KeyBuilder_t2049706641_StaticFields*)KeyBuilder_t2049706641_il2cpp_TypeInfo_var->staticfields)->rng_0;

XCode reports "Worker Pool Thread #2 (38): EXE_BAD_ACCESS (code=1, address=0x0) on the call to RandomNumberGenerator

johnnyhoffman commented 8 years ago

This is fixed with the most recent version of the SDK. Please import the new Unity packages into your projects.