GetStream / swift-activity-feed

Stream Swift iOS Activity Feed Components
https://getstream.io/
BSD 3-Clause "New" or "Revised" License
30 stars 17 forks source link

Crash on Getstream AuthorizationMoyaPlugin #19

Closed angelolloqui closed 2 years ago

angelolloqui commented 2 years ago

We are getting several crashes on production from the class AuthorizationMoyaPlugin in GetStream library.

The crash report is:

Crashed: org.alamofire.session.rootQueue.requestQueue
0  CoreFoundation                 0x21a34 CFStringGetLength + 60
1  CFNetwork                      0x7e00 CFURLRequestSetHTTPRequestBody + 20148
2  CFNetwork                      0x5988 CFURLRequestSetHTTPRequestBody + 10812
3  CFNetwork                      0x963c CFURLRequestAppendHTTPHeaderFieldValue + 268
4  libswiftFoundation.dylib       0x2eb558 URLRequest.addValue(_:forHTTPHeaderField:) + 304
5  GetStream                      0x137a8 AuthorizationMoyaPlugin.prepare(_:target:) + 26 (AuthorizationMoyaPlugin.swift:26)
6  GetStream                      0x13820 protocol witness for PluginType.prepare(_:target:) in conformance AuthorizationMoyaPlugin + 20 (<compiler-generated>:20)
7  Moya                           0x11134 closure #1 in MoyaProvider.interceptor(target:) + 169 (MoyaProvider+Internal.swift:169)
8  Moya                           0xb630 specialized MoyaRequestInterceptor.adapt(_:for:completion:) + 133 (Moya+Alamofire.swift:133)
9  Moya                           0xb1dc protocol witness for RequestAdapter.adapt(_:for:completion:) in conformance MoyaRequestInterceptor + 28 (<compiler-generated>:28)
10 Alamofire                      0xafb1c Session.performSetupOperations(for:convertible:) + 1988 (<compiler-generated>:1988)
11 Alamofire                      0xae950 closure #1 in closure #1 in Session.perform(_:) + 1010 (Session.swift:1010)
12 Alamofire                      0x372b8 thunk for @escaping @callee_guaranteed () -> () + 20 (<compiler-generated>:20)
13 libdispatch.dylib              0x1c04 _dispatch_call_block_and_release + 32
14 libdispatch.dylib              0x3950 _dispatch_client_callout + 20
15 libdispatch.dylib              0xb0ac _dispatch_lane_serial_drain + 664
16 libdispatch.dylib              0xbc44 _dispatch_lane_invoke + 444
17 libdispatch.dylib              0xaf84 _dispatch_lane_serial_drain + 368
18 libdispatch.dylib              0xbc10 _dispatch_lane_invoke + 392
19 libdispatch.dylib              0x16318 _dispatch_workloop_worker_thread + 656
20 libsystem_pthread.dylib        0x11b0 _pthread_wqthread + 288
21 libsystem_pthread.dylib        0xf50 start_wqthread + 8

It seems the token variable is not correct when setting this value in the request.

I am checking the code and I do not see any wrong thing, except that token is an instance variable that might change and produce a race condition between the if token.isEmpty and the actual usage. Not sure if it helps, but maybe rewriting it to this would protect it:

    func prepare(_ request: URLRequest, target: TargetType) -> URLRequest {
        let token = self.token
        if token.isEmpty {
            return request
        }

        var request = request
        request.addValue("jwt", forHTTPHeaderField: "Stream-Auth-Type")
        request.addValue(token, forHTTPHeaderField: "Authorization")

        return request
    }

Related info

- GetStream (2.2.4):
    - GetStream/Core (= 2.2.4)
    - GetStream/Faye (= 2.2.4)
  - GetStream/Core (2.2.4):
  - GetStream/Faye (2.2.4):

Xcode 13.0 Crashes detected in several devices, including iOS15 and iOS14

tbarbugli commented 2 years ago

thank you for opening this issue, the suggested change makes sense (worst case it will not solve the problem you are facing but it won't do any harm).

If you want to speed things up, I suggest opening a PR with this code change :)

ceciliadev commented 2 years ago

Any update on the issue? I did open a PR with the suggested code, but it's still waiting for review.

angelolloqui commented 2 years ago

Please @tbarbugli can you give a bit of ❤️ to this one?

angelolloqui commented 2 years ago

@tbarbugli any plan to fix this? it is by far the main cause of our app crashes, 10x more frequent than our next crash reports

(The previous commit did not fix the issue)

nuno-vieira commented 2 years ago

Hi @angelolloqui!

I'm not very familiar with the codebase yet, but looking at the crash log, this issue does not seem related to our SDK. First, the token is not an instance, it is just a typealias of String, so it is a value type, which means when passed to the addValue function, it gets copied, so even if the token were changed outside of it, it wouldn't impact anything inside the networking calls.

Doing a quick research, this could actually be related to an SDK you might be using that is doing runtime swizzling on the URLSession, at least according to this issue: https://github.com/Alamofire/Alamofire/issues/3515.

Are you using any third-party SDK for analytics or crashlytics? This could be anything like Firebase, Sentry, Datadog, etc...

Best, Nuno

angelolloqui commented 2 years ago

Hi @nuno-vieira , I appreciate your answer and you having some time to look at it. However, I am not sure I am following your comment because the fact that a string is a value type does not guarantee thread safety. In fact, it is a common belief that value types are copied but they are most of the times not, just because of compilation optimizations. You can read about thread safety (or lack of) in value types here: https://forums.swift.org/t/understanding-swifts-value-type-thread-safety/41406/6

My guess (I am not familiar with the code either) is that you are changing the token from a different thread without synchronizing the read/write operation, resulting in rare but possible crashes when read and write operations are happening in parallel.

I will check the link you provided more closely but in a first quick look it seems an unrelated issue, not having the same stack trace. Besides, if it would be due to swizzling it would affect all calls in the app since we make heavy use of urlsessions ourselves but it is only happening in the moya plug-in.

thanks!

nuno-vieira commented 2 years ago

Hi @angelolloqui,

I'm familiar with thread safety, and I didn't say value types avoided thread safety issues. If you read the stack trace, you can see the crash happens inside the URLRequest.addValue() function. We do not control this function, it is from Foundation.CFNetwork. So, there's no way we are changing a value that would impact something inside the URLRequest.addValue() function since by the time we pass the token to URLRequest, the ownership of the token belongs to URLRequest and URLSession, it won't be affected outside of it. This does not mean, inside of URLRequest.addValue() or URLSession or Alamofire there couldn't be a thread-safety issue, it can have, but there's nothing we are doing that would affect this thread safety issue because if we change the token, it won't change anything inside URLRequest or URLSession. This would only be a thread safety issue for us if the crash was happening when changing the token property, which is not the case.

angelolloqui commented 2 years ago

Hi @nuno-vieira, once again thanks for replying that quickly.

I see your point in that it is happening inside the CFNetwork and that the ownership is then moved into that module, however ARC makes tons of magic to optimize memory management and remove unnecessary retain calls so I would not bet my money in any kind of protection (on the contrary, I think there will be none since it assumes code is not thread safe and therefore there is no point on retaining). If you look closely to the stack trace, the fact that it happens inside CFNetwork looks like merely a coincidence, since the crash is when invoking the length method of the token (from CoreFoundation).

However, as I know you are probably not convinced by my argument (I would not be), let me share you a different stack trace of the crash:

Crashed: org.alamofire.session.rootQueue.requestQueue
0  libobjc.A.dylib                0x1f40 objc_msgSend + 32
1  libswiftCore.dylib             0x3db6fc swift_bridgeObjectRetain + 48
2  GetStream                      0x13a7c AuthorizationMoyaPlugin.prepare(_:target:) + 26 (AuthorizationMoyaPlugin.swift:26)
3  GetStream                      0x13b50 protocol witness for PluginType.prepare(_:target:) in conformance AuthorizationMoyaPlugin + 20 (<compiler-generated>:20)
4  Moya                           0x11554 closure #1 in MoyaProvider.interceptor(target:) + 169 (MoyaProvider+Internal.swift:169)
5  Moya                           0xbcec specialized MoyaRequestInterceptor.adapt(_:for:completion:) + 133 (Moya+Alamofire.swift:133)
6  Moya                           0xb894 protocol witness for RequestAdapter.adapt(_:for:completion:) in conformance MoyaRequestInterceptor + 28 (<compiler-generated>:28)
7  Alamofire                      0xb7388 specialized Session.performSetupOperations(for:convertible:shouldCreateTask:) + 1078 (Session.swift:1078)
8  Alamofire                      0xb61f4 closure #1 in closure #1 in Session.perform(_:) + 1015 (Session.swift:1015)
9  Alamofire                      0x3ea20 thunk for @escaping @callee_guaranteed () -> () + 28 (<compiler-generated>:28)
10 libdispatch.dylib              0x1e6c _dispatch_call_block_and_release + 32
11 libdispatch.dylib              0x3a30 _dispatch_client_callout + 20
12 libdispatch.dylib              0xb124 _dispatch_lane_serial_drain + 668
13 libdispatch.dylib              0xbcb4 _dispatch_lane_invoke + 444
14 libdispatch.dylib              0xb000 _dispatch_lane_serial_drain + 376
15 libdispatch.dylib              0xbcb4 _dispatch_lane_invoke + 444
16 libdispatch.dylib              0xb000 _dispatch_lane_serial_drain + 376
17 libdispatch.dylib              0xbc80 _dispatch_lane_invoke + 392
18 libdispatch.dylib              0x16500 _dispatch_workloop_worker_thread + 648
19 libsystem_pthread.dylib        0x10bc _pthread_wqthread + 288
20 libsystem_pthread.dylib        0xe5c start_wqthread + 8

As you can see here, the crash is this time happening directly in the AuthorizationMoyaPlugin.prepare and it seems to be caused when trying to retain the token, so Alamofire or CFNetwork here are not related in any way.

Once again, this is a tricky one and I might be mistaken, but I am quite confident than the problem is caused by the GetStream code because of a threading issue. As I said, I am not familiar with the code, but I will try to dig inside it, because I expect some kind of manipulation of the token variable from a different thread (this one is a background one) without synchronization primitives.

I hope this brings some extra light to it... I will get back with some actual code when I get time to dig into your library, but I appreciate if you could also check for token manipulation places and threading from where it is happening.

Thanks again!

nuno-vieira commented 2 years ago

Hi @angelolloqui,

I find it really hard for this to be the issue. You can even test this yourself and see that the following test below doesn't crash:

func test_token_isThreadSafe() {
        let plugin = AuthorizationMoyaPlugin()

        for index in 0...1000 {
            plugin.token = String(index)
        }

        DispatchQueue.concurrentPerform(iterations: 1000) { _ in
            _ = plugin.token
            plugin.token = "Hey"
        }

        DispatchQueue.concurrentPerform(iterations: 1000) { _ in
            _ = plugin.prepare(URLRequest(url: URL(string: "url")!))
        }
    }

In that test, we are reading and writing the token in multiple threads, and we are also calling prepare() in multiple threads. I understand that in the article you shared, for Strings and Ints these could still be a problem. But the reason that I find it hard for this to be the root cause of the issue is that we only manipulate the token in the setupUser() and disconnect() which should not clash at all with the prepare() method, but of course, this is my initial impression with our codebase.

Either way, it would be nice if you could research as well if there is any SDK in your app that could be causing this, and try to update all of your SDKs. If after you update your SDKs the issue still persists then most likely could be true that the issue is on our side. We will then make the moya plugin thread-safe or re-configure moya whenever the token changes.

I'll speak with the team internally, and will give you an ETA for this as soon as possible.

Best, Nuno

angelolloqui commented 2 years ago

Hi @nuno-vieira , very interesting discussion. And actually, you had a great idea with your test. I made a few changes to your test:

func test_token_isThreadSafe() {
        let plugin = AuthorizationMoyaPlugin()

        DispatchQueue.concurrentPerform(iterations: 1_000_000) { i in
            plugin.token = "\(i)"
            _ = plugin.prepare(URLRequest(url: URL(string: "url")!))
            _ = plugin.token
        }
    }

I am doing this to force the read and write access to be executed in parallel (in your version, I think the concurrency happens inside each block, but not accross the read/writes). With this test, I am not getting a crash either, but when you activate the thread sanitizer you can see this output:

WARNING: ThreadSanitizer: data race (pid=6880)
  Write of size 8 at 0x000110d795d0 by thread T6:
    #0 closure #1 in MatchTests.test_token_isThreadSafe() MatchTests.swift:45 (Anemone SDKTests:arm64+0xda61c)
    #1 partial apply for closure #1 in MatchTests.test_token_isThreadSafe() <compiler-generated> (Anemone SDKTests:arm64+0xda8b0)
    #2 partial apply for thunk for @callee_guaranteed (@unowned Int) -> () <null>:81949076 (libswiftDispatch.dylib:arm64+0x42f4)
    #3 _dispatch_client_callout2 <null>:81949076 (libdispatch.dylib:arm64+0x35dc)

  Previous write of size 8 at 0x000110d795d0 by thread T2:
    #0 closure #1 in MatchTests.test_token_isThreadSafe() MatchTests.swift:45 (Anemone SDKTests:arm64+0xda61c)
    #1 partial apply for closure #1 in MatchTests.test_token_isThreadSafe() <compiler-generated> (Anemone SDKTests:arm64+0xda8b0)
    #2 partial apply for thunk for @callee_guaranteed (@unowned Int) -> () <null>:81949076 (libswiftDispatch.dylib:arm64+0x42f4)
    #3 _dispatch_client_callout2 <null>:81949076 (libdispatch.dylib:arm64+0x35dc)

  Location is heap block of size 32 at 0x000110d795c0 allocated by main thread:
    #0 __sanitizer_mz_malloc <null>:81949076 (libclang_rt.tsan_iossim_dynamic.dylib:arm64+0x51004)
    #1 _malloc_zone_malloc <null>:81949076 (libsystem_malloc.dylib:arm64+0x1527c)
    #2 MatchTests.test_token_isThreadSafe() MatchTests.swift:42 (Anemone SDKTests:arm64+0xda1f4)
    #3 @objc MatchTests.test_token_isThreadSafe() <compiler-generated> (Anemone SDKTests:arm64+0xda928)
    #4 __invoking___ <null>:81949076 (CoreFoundation:arm64+0x11c5ec)

  Thread T6 (tid=6211276, running) is a GCD worker thread

  Thread T2 (tid=6211272, running) is a GCD worker thread

SUMMARY: ThreadSanitizer: data race MatchTests.swift:45 in closure #1 in MatchTests.test_token_isThreadSafe()

As you can see, it detects the data race due to threading. Why it does not crash in test? I am not sure, but my guess is that the compilation options have something to do here, since for tests we do not have the full optimization options that are provided in final builds and that might remove "unnecessary" retain/releases (causing the crash in this case).

Again, it is just a guess, but at least now with the thread sanitiser we can see something more "palpable"

angelolloqui commented 2 years ago

UPDATE: With the previous test, I tried setting some sync primitives like this:

final class AuthorizationMoyaPlugin {
    private var _token: Token
    var token: Token {
        get {
            syncQueue.sync {
                _token
            }
        }
        set {
            syncQueue.sync {
                _token = newValue
            }
        }
    }
    let syncQueue = DispatchQueue(label: "com.test.mySerialQueue")

    init(_ token: Token = "") {
        _token = token
    }
   ...

And now the ThreadSanitizer issue is gone. Code is not looking great, but at least you can see how this seems to prevent the data race.

I hope it helps!

nuno-vieira commented 2 years ago

@angelolloqui Nice catch! Yes, that is a pretty standard way to make it thread-safe. Although, when updating the token, it doesn't need to block the thread. This could be an issue if you are updating the token from the main thread for example.

I've opened the PR here: https://github.com/GetStream/stream-swift/pull/44

nuno-vieira commented 2 years ago

Closing the issue since it is being tracked here: https://github.com/GetStream/stream-swift/pull/44

And it is now available on 2.2.5