Closed marksvend closed 1 year ago
Hi @marksvend, thanks for the issue and included debugging analysis. It certainly looks like there are paths out of the interceptor request chain that aren't being cleaned up properly. We'll take a look and resolve asap.
I've also been noticing a number of crashes with request chains that started popping up recently as well. Not sure if they are related to this, but here is a stack trace:
Note I have not been able to reproduce these crashes. But we have definitely seen an uptick of these crashes since upgrading to 1.2.0.
Crashed: com.apollographql.ApolloStore
0 libobjc.A.dylib 0x1cf4 objc_msgSend + 20
1 EnvoyMobile 0xcbddc0 InterceptorRequestChain.proceedAsync<A>(request:response:completion:interceptor:) + 25 (Atomic.swift:25)
2 EnvoyMobile 0xc96b54 ApolloInterceptorReentrantWrapper.proceedAsync<A>(request:response:completion:) + 38 (ApolloInterceptorReentrantWrapper.swift:38)
3 EnvoyMobile 0xc9ccd0 closure #1 in CacheReadInterceptor.interceptAsync<A>(chain:request:response:completion:) + 54 (CacheReadInterceptor.swift:54)
4 EnvoyMobile 0xc9d1c4 partial apply for closure #1 in CacheReadInterceptor.fetchFromCache<A>(for:chain:completion:) + 99 (CacheReadInterceptor.swift:99)
5 EnvoyMobile 0xc9f074 static OS_dispatch_queue.returnResultAsyncIfNeeded<A>(on:action:result:) + 22 (DispatchQueue+Optional.swift:22)
6 EnvoyMobile 0xc97b1c closure #1 in ApolloStore.withinReadTransaction<A>(_:callbackQueue:completion:) + 113 (ApolloStore.swift:113)
7 EnvoyMobile 0xc94d5c thunk for @escaping @callee_guaranteed () -> () + 265212 (<compiler-generated>:265212)
8 libdispatch.dylib 0x63094 _dispatch_call_block_and_release + 24
9 libdispatch.dylib 0x64094 _dispatch_client_callout + 16
10 libdispatch.dylib 0x6bb8 _dispatch_continuation_pop$VARIANT$mp + 440
11 libdispatch.dylib 0x62e0 _dispatch_async_redirect_invoke + 588
12 libdispatch.dylib 0x13b94 _dispatch_root_queue_drain + 340
13 libdispatch.dylib 0x1439c _dispatch_worker_thread2 + 172
14 libsystem_pthread.dylib 0x1dc4 _pthread_wqthread + 224
15 libsystem_pthread.dylib 0x192c start_wqthread + 8
Yes, we are also seeing a crash from this code. I was just about to report it.
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000010
Exception Codes: 0x0000000000000001, 0x0000000000000010
VM Region Info: 0x10 is not in any region. Bytes before following region: 68719476720
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
commpage (reserved) 1000000000-7000000000 [384.0G] ---/--- SM=NUL ...(unallocated)
Triggered by Thread: 0
Kernel Triage:
VM - (arg = 0x0) pmap_enter retried due to resource shortage
VM - (arg = 0x0) pmap_enter retried due to resource shortage
VM - (arg = 0x0) pmap_enter retried due to resource shortage
VM - (arg = 0x0) pmap_enter retried due to resource shortage
VM - (arg = 0x0) pmap_enter retried due to resource shortage
Thread 0 Crashed:
0 AppName 0x00000001053920d0 specialized Atomic.wrappedValue.getter + 0 (Atomic.swift:23)
1 AppName 0x00000001053920d0 InterceptorRequestChain.isCancelled.getter + 4 (InterceptorRequestChain.swift:0)
2 AppName 0x00000001053920d0 InterceptorRequestChain.proceedAsync<A>(request:response:completion:interceptor:) + 240 (InterceptorRequestChain.swift:119)
3 AppName 0x000000010536b2b0 ApolloInterceptorReentrantWrapper.proceedAsync<A>(request:response:completion:) + 76 (ApolloInterceptorReentrantWrapper.swift:38)
4 AppName 0x0000000104e7faf4 closure #1 in GraphQLWeblabInterceptor.interceptAsync<A>(chain:request:response:completion:) + 1624 (GraphQLWeblabInterceptor.swift:52)
5 AppName 0x0000000104e7fd81 partial apply for closure #1 in GraphQLWeblabInterceptor.interceptAsync<A>(chain:request:response:completion:) + 1 (<compiler-generated>:0)
6 AppName 0x0000000104e8131d specialized thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) + 1
7 AppName 0x0000000104fe55cd partial apply for specialized thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) + 1 (<compiler-generated>:0)
8 libswift_Concurrency.dylib 0x00000001ad589dd9 completeTaskWithClosure(swift::AsyncContext*, swift::SwiftError*) + 1 (Task.cpp:496)
We are also crashing with our prod app after upgrading to 1.2.0:
Crashed: com.apple.main-thread
0 EVgoCharger 0x64ac9c InterceptorRequestChain.proceedAsync<A>(request:response:completion:interceptor:) + 23 (Atomic.swift:23)
1 EVgoCharger 0x623e7c ApolloInterceptorReentrantWrapper.proceedAsync<A>(request:response:completion:) + 38 (ApolloInterceptorReentrantWrapper.swift:38)
2 EVgoCharger 0x14e48c closure #1 in AuthorizationInterceptor.interceptAsync<A>(chain:request:response:completion:) + 36 (AuthorizationInterceptor.swift:36)
3 EVgoCharger 0x14eb38 partial apply for closure #1 in AuthorizationInterceptor.interceptAsync<A>(chain:request:response:completion:) + 4369460024 (<compiler-generated>:4369460024)
4 EVgoCharger 0x14ea40 thunk for @escaping @callee_guaranteed (@guaranteed FIRAuthTokenResult?, @guaranteed Error?) -> () + 4369459776 (<compiler-generated>:4369459776)
5 libdispatch.dylib 0x2320 _dispatch_call_block_and_release + 32
6 libdispatch.dylib 0x3eac _dispatch_client_callout + 20
7 libdispatch.dylib 0x126a4 _dispatch_main_queue_drain + 928
8 libdispatch.dylib 0x122f4 _dispatch_main_queue_callback_4CF + 44
9 CoreFoundation 0x98d18 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 16
10 CoreFoundation 0x7a650 __CFRunLoopRun + 1992
11 CoreFoundation 0x7f4dc CFRunLoopRunSpecific + 612
12 GraphicsServices 0x135c GSEventRunModal + 164
13 UIKitCore 0x39d37c -[UIApplication _run] + 888
14 UIKitCore 0x39cfe0 UIApplicationMain + 340
15 EVgoCharger 0x7424 main + 6 (main.swift:6)
16 ??? 0x1bc33cdec (Missing)
I'll be taking a look into this one today.
I'm curious about the versioning though since all of the bug reports here indicate "update to 1.2.0
". Substantial changes to the interceptor chain were released in 1.1.0
to support multipart HTTP based subscriptions. So if you're upgrading from a version < 1.1.0
to 1.2.0
then I can see the link but if you had been using anything > 1.1.0
prior to the 1.2.0
upgrade then it may indicate something else.
@calvincestari I updated from 1.0.6 to 1.2.0
We also upgraded from 1.0.6 to 1.2.0
@marksvend @sgharraph - do either of you use custom interceptors in your request chain?
We do use a few custom interceptors
We also use custom interceptors. You can see that from the stack trace I attached
+1. We are also seeing a crash with a stack trace very similar when updating from 1.0.5 to 1.2.1, and we also use custom interceptors:
#0 0x0000000116398bcc in InterceptorRequestChain.isCancelled.getter ()
#1 0x000000011639a04c in InterceptorRequestChain.proceedAsync<τ_0_0>(request:response:completion:interceptor:) at /Users/rick.pasetto/Documents/GitHub/swift/Apps/Tuist/Dependencies/SwiftPackageManager/.build/checkouts/apollo-ios/Sources/Apollo/InterceptorRequestChain.swift:119
#2 0x000000011635902c in ApolloInterceptorReentrantWrapper.proceedAsync<τ_0_0>(request:response:completion:) at /Users/rick.pasetto/Documents/GitHub/swift/Apps/Tuist/Dependencies/SwiftPackageManager/.build/checkouts/apollo-ios/Sources/Apollo/ApolloInterceptorReentrantWrapper.swift:38
#3 0x00000001163594b4 in protocol witness for RequestChain.proceedAsync<τ_0_0>(request:response:completion:) in conformance ApolloInterceptorReentrantWrapper ()
#4 0x00000001177f0ac8 in closure #1 in AuthInterceptor.interceptAsync<τ_0_0>(chain:request:response:completion:) at /Users/rick.pasetto/Documents/GitHub/swift/Apps/Targets/ShuttleNetworking/Sources/ApolloInterceptors.swift:30
#5 0x00000001177f0de4 in partial apply for closure #1 in AuthInterceptor.interceptAsync<τ_0_0>(chain:request:response:completion:) ()
#6 0x00000001177efe4c in thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()
#7 0x00000001177eff8c in partial apply for thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()
This has been a tricky issue to resolve given how we chose to implement HTTP-based subscriptions while maintaining 100% backwards compatibility for existing request chains. I don't think the bugs are related to custom interceptors necessarily but rather how we had to use an Unmanaged
instance of InterceptorRequestChain
.
I discarded the first solution (#3065) because it required authors of custom interceptors to know too much internal knowledge of how the request chain worked and how it managed the unmanaged instance.
I have another solution up at #3070 which removes the unmanaged instance and goes back to the old memory model for the request chain, while still supporting HTTP-based subscriptions, but at the cost of being a minor breaking change and requiring an update to custom interceptors. I'm busy with the last set of tests for that PR and then adding a migration guide which will detail the changes needed.
The fix for this has been merged into the release/1.3
branch which will be the next release; date unknown at the moment though.
Our docs system doesn't render changes unless the branch is built against main
but you can still read about the migration that will be needed in markdown here.
Since the solution completely removed the Unmanaged
object I'm confident it'll resolve both the leaks and crash bugs that were reported in this issue. Until the official release though it would be helpful if you were able to test your projects against the release/1.3
branch to confirm.
@calvincestari Great! That seems to have done the trick. Looking forward to official release of 1.3!
Summary
I'm reporting a memory leak in the iOS Apollo library version 1.2.0 where the request chain does not get deallocated when there is a cache hit.
When a request chain finishes with
returnValueAsync
and does not callproceedAsync
, thereleaseManagedSelf
never gets called. The retain cycle never gets broken, and the chain leaks. This typically happens in the success case ofCacheReadInterceptor.returnCacheDataElseFetch
.I confirmed that adding a call to
self.releaseManagedSelf
fromInterceptorRequestChain.returnValueAsync
solves the memory leak: all of the chain instances get deallocated properly.However, this solution would break the
CacheReadInterceptor.returnCacheDataAndFetch
case, which calls bothreturnValueAsync
andproceedAsync
. It wouldn't be right to callreleaseManagedSelf
prematurely before the chain has finished. There's no way to know whenreturnValueAsync
is called whether that is the end or not.There's another leak in the
handleErrorAsync
function because it does not callself.releaseManagedSelf
, either. But that is always the end of the chain so it should be safe to release self there.Version
1.2.0
Steps to reproduce the behavior
I discovered the problem using the memory graph debugger in Xcode after executing many queries in the app. As long as some of them return cached data, a large multi-node memory cycle will appear in the debugger.
Logs
Anything else?
No response