Open infiniteloopunroll opened 1 year ago
Here is a test project set up as described above, that allows for easy reproduction of the issue on Mac. high cpu usage test project.zip
Hi @infiniteloopunroll
Thank you for reporting.
Does this occur to you if Sandboxing
is NOT enabled?
The first thing come to my mind is that Firebase Auth SDK attempting to store the auth token in macOS Keychain in that scheduler thread. I am not sure if this would work properly in a sandbox environment. Are you able to enable keychain in this environment?
Another possibility is when Firestore is trying to store persistent data under ~/.firebase/
but unable to access it.
Could you verify if these are the case? Also, are you able to provide the full callstack of those threads using lldb
and bt all
?
Thank! Shawn
Hey @infiniteloopunroll. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.
If you have more information that will help us get to the bottom of this, just add a comment!
Thanks for your feedback @chkuang-g! Due to summer vacations we need a bit more time getting back to you on this. Please don't close just yet. 🙏
Hi @chkuang-g, thanks for reaching out for further information. This issue is very important to me.
Here is what I can tell you right now:
Unfortunately I can't provide the full call stack right now but will follow up in a couple of days, if it is still helpful.
Best, Robin
Hi @chkuang-g, as requested, I can now provide you with the following additional information:
1. Call stack (lldb bt all
):
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x0000000192dce8b0 libsystem_kernel.dylib`mach_msg_trap + 8
frame #1: 0x0000000192dced20 libsystem_kernel.dylib`mach_msg + 76
frame #2: 0x0000000192ed92c0 CoreFoundation`__CFRunLoopServiceMachPort + 372
frame #3: 0x0000000192ed7770 CoreFoundation`__CFRunLoopRun + 1180
frame #4: 0x0000000192ed6b34 CoreFoundation`CFRunLoopRunSpecific + 600
frame #5: 0x000000019bb16338 HIToolbox`RunCurrentEventLoopInMode + 292
frame #6: 0x000000019bb160b4 HIToolbox`ReceiveNextEventCommon + 564
frame #7: 0x000000019bb15e68 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 72
frame #8: 0x0000000195a3e51c AppKit`_DPSNextEvent + 860
frame #9: 0x0000000195a3ce14 AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1328
frame #10: 0x0000000195a2efe0 AppKit`-[NSApplication run] + 596
frame #11: 0x0000000195a006fc AppKit`NSApplicationMain + 1132
frame #12: 0x0000000104995c8c FirebaseCppTestApp`main(argc=3, argv=0x000000016b46b3c0) at main.m:14:12
frame #13: 0x000000010591508c dyld`start + 520
thread #8, name = 'com.apple.NSEventThread'
frame #0: 0x0000000192dce8b0 libsystem_kernel.dylib`mach_msg_trap + 8
frame #1: 0x0000000192dced20 libsystem_kernel.dylib`mach_msg + 76
frame #2: 0x0000000192ed92c0 CoreFoundation`__CFRunLoopServiceMachPort + 372
frame #3: 0x0000000192ed7770 CoreFoundation`__CFRunLoopRun + 1180
frame #4: 0x0000000192ed6b34 CoreFoundation`CFRunLoopRunSpecific + 600
frame #5: 0x0000000195bab100 AppKit`_NSEventThread + 196
frame #6: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
thread #13
frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
frame #1: 0x0000000104bf6054 FirebaseCppTestApp`firebase::rest::CurlThread::GetNextAction(firebase::rest::TransportCurlActionData*, long long) + 260
frame #2: 0x0000000104bf6478 FirebaseCppTestApp`firebase::rest::CurlThread::ProcessRequests() + 428
frame #3: 0x0000000104a1f314 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(void*), void*> >(void*) + 44
frame #4: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
thread #14
frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
frame #1: 0x0000000104a1e79c FirebaseCppTestApp`firebase::scheduler::Scheduler::WorkerThreadRoutine(void*) + 408
frame #2: 0x0000000104a1f314 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(void*), void*> >(void*) + 44
frame #3: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
thread #15
frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
frame #1: 0x00000001049abc64 FirebaseCppTestApp`firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*) + 480
frame #2: 0x00000001049b2020 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(firebase::auth::IdTokenRefreshThread*), firebase::auth::IdTokenRefreshThread*> >(void*) + 44
frame #3: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
thread #16
frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
frame #1: 0x0000000104a1e79c FirebaseCppTestApp`firebase::scheduler::Scheduler::WorkerThreadRoutine(void*) + 408
frame #2: 0x0000000104a1f314 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(void*), void*> >(void*) + 44
frame #3: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
thread #18
frame #0: 0x00000001058a3a94 libsystem_pthread.dylib`start_wqthread
2. Heaviest stack trace (Instruments):
116178 FirebaseCppTestApp (18930)
29015 firebase::auth::IdTokenRefreshThread::Initialize 0x27e3c
29015 thread_start
29015 _pthread_start
29015 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(firebase::auth::IdTokenRefreshThread*), firebase::auth::IdTokenRefreshThread*> >(void*)
29015 firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*)
24670 sem_wait
340 cerror
215 cerror_nocancel
3. Time profile (Instruments):
Weight Self Weight Symbol Name
1.94 min 100.0% 0 s FirebaseCppTestApp (18930)
29.02 s 24.9% 0 s firebase::auth::IdTokenRefreshThread::Initialize 0x27e3c
29.02 s 24.9% 0 s thread_start
29.02 s 24.9% 0 s _pthread_start
29.02 s 24.9% 0 s void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(firebase::auth::IdTokenRefreshThread*), firebase::auth::IdTokenRefreshThread*> >(void*)
29.02 s 24.9% 0 s firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*)
24.67 s 21.2% 24.33 s sem_wait
340.00 ms 0.2% 27.00 ms cerror
215.00 ms 0.1% 215.00 ms cerror_nocancel
60.00 ms 0.0% 60.00 ms _pthread_exit_if_canceled
37.00 ms 0.0% 37.00 ms _pthread_exit_if_canceled
1.00 ms 0.0% 1.00 ms cerror
2.12 s 1.8% 179.00 ms firebase::Mutex::Release()
2.08 s 1.7% 40.00 ms firebase::Mutex::Acquire()
110.00 ms 0.0% 110.00 ms firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*)
33.00 ms 0.0% 33.00 ms DYLD-STUB$$sem_wait
29.02 s 24.9% 0 s firebase::scheduler::Scheduler::WorkerThreadRoutine 0x27e3d
28.97 s 24.9% 0 s firebase::scheduler::Scheduler::WorkerThreadRoutine 0x27e3b
28.61 s 24.6% 0 s firebase::rest::CurlThread::ProcessRequests 0x27e3a
434.00 ms 0.3% 0 s Main Thread 0x27902
84.00 ms 0.0% 0 s _NSEventThread 0x27ae7
16.00 ms 0.0% 0 s _dispatch_workloop_worker_thread 0x27ae0
12.00 ms 0.0% 0 s _dispatch_workloop_worker_thread 0x27d45
9.00 ms 0.0% 0 s _dispatch_workloop_worker_thread 0x27ae4
6.00 ms 0.0% 0 s _dispatch_worker_thread2 0x27dde
5.00 ms 0.0% 0 s _dispatch_workloop_worker_thread 0x27e29
2.00 ms 0.0% 0 s start_wqthread 0x27e27
1.00 ms 0.0% 0 s f_b_getaddrinfo_thread 0x27e3e
1.00 ms 0.0% 0 s _dispatch_workloop_worker_thread 0x27e41
Also find the complete Instruments time profile attached, as a .trace file bundle. I think this might be especially helpful, to look into the issue.
Thanks for your help! Looking forward to any feedback.
Best, Robin
Oh, this is incredibly useful. Thank you!
Based on your information, I think this happens when IdTokenRefreshThread
is waiting for token refresh request.
https://github.com/firebase/firebase-cpp-sdk/blob/3f79c788eb396b499e98f6f9192cdd9b4020cf00/auth/src/desktop/auth_desktop.cc#L694
This refresh request will need internet access in order to get the token from the auth backend. I assume you did enabled Outgoing connection (client)
in the sandboxing setting, correct?
https://developer.apple.com/documentation/xcode/configuring-the-macos-app-sandbox/
I have a feeling this is something weird going on with this Sandboxing mode. May I understand why you need to run macOS application in this mode?
May I understand why you need to run macOS application in this mode?
All software distributed in the Mac App Store must be sandboxed per Apple guidelines, and additionally there are some Apple APIs that can only be accessed if the app is running sandboxed (even for local test builds). This is why these days nearly all macOS software is sandboxed.
Hi @chkuang-g, thanks for looking into this issue further! It would be great to resolve this, because as @triplef has already explained, running apps in a sandbox on macOS today is the rule and not an exception, for many reasons. And I think it would be especially nice to address the problem, because some parts of the FB C++ SDK have already been prepared to work with Mac sandboxing (for example, library folder and keychain access) and there only seems to be a missing piece to make it fully work.
Networking entitlements are set in the demo project. Maybe also interesting to note that everything appears to work, even in sandboxed mode, except for the persistent high CPU utilisation by the four worker threads. For example, authentication still succeeds and data can be transferred to and from FB.
Please let me know if I can provide any further information to track down the issue. Super cool that you look into it! Would be so nice to resolve this.
Best, Robin
@triplef To help us narrow this down, are you also running on a M1 Mac?
Thanks!
@DellaBitta Yes, this issue was observed when running on Apple Silicon ARM64, macOS 12.x. Please let me know it I can provide any other information. Thanks!
- Problematic Firebase Component: Auth (Auth, Database, etc.)
- Other Firebase Components in use: Firestore (Auth, Database, etc.)
- Platform you are using the C++ SDK on: Mac ARM64 (Mac, Windows, or Linux)
- Platform you are targeting: Desktop (iOS, Android, and/or desktop)
I can reproduce the same issue on Intel Mac running macOS 12.6.
I was able to reproduce this on an intel mac as well. Thanks for the test case!
@DellaBitta Thanks for looking into this issue! It would be so great to get this resolved, so that the FB desktop SDK can be used on macOS. Is there any update you can give, as of today? Really appreciate your help! And let me know if there is anything else we can do to help.
Best, Robin
Hi @infiniteloopunroll,
I've been looking into this issue and I believe I've found the problem.
When macOS Sandbox mode is enabled, macOS requires that semaphores have a name that is prefixed by the App's Group Name. If the semaphore's name doesn't match this convention then its creation fails.
As an additional issue, we attempted to detect semaphore creation errors by comparing the semaphore handle to nil
. But in the the case of macOS, a a semaphore creation error returns SEM_FAILED
which is 0xFFFFFFFFFFFFFFFF
, not nil
. We therefore didn't detect these creation errors and started to use the semaphore anyway. Subsequent sem_wait
calls would fail, too, which is why we saw tight looping and high CPU usage in our worker threads. They were just constantly spinning.
Unfortunately there's no official way for the SDK to query the app's group name at runtime, so we can't automatically mangle the semaphore names.
Instead I'm thinking that the Firebase SDK could use a property in your Info.plist
, something that you could add like FBAppGroupEntitlementName
for instance. If that property is present then the SDK would use it to prefix the semaphore names. This way you could plumb the App Group name to the SDK, and the semaphores would be created properly.
Would this be a good way to solve the problem for you?
Also, it turns out the network sandbox entitlements com.apple.security.network.client
and ...server
values aren't related to this problem. That is, I can reproduce this issue without configuring them. If someone encounters an issue when these are set then please let me know!
Hi @DellaBitta, thanks for looking into this issue and identifying the problem! That sounds super interesting and I can see how it would lead to issues when setting up the semaphore. Your proposed solution looks great as well, we think that would make it really easy for macOS developers to add the app’s group name in a way that you can read from the FB SDK. Would it in your view make sense to add the app group name to firebase::AppOptions as well? That might add a little more flexibility for developers, but having it read directly from the plist sounds great. Really looking forward to testing this new solution and assisting you with any further information that might be helpful, please just let me know if there is anything else that I can provide. Best, Robin
Hi @infiniteloopunroll,
We thought adding a field to AppOptions but in theory, if you initialize the default app and then later create a custom app with AppOptions, it would already be too late to configure the semaphore names. Or, alternatively, subsequent calls to create an App with different AppOptions wouldn't affect the name of the semaphores already created.
We'd rather have a solution that doesn't make it seem like the configuration can vary across instances of Firebase Apps.
That being said, we're also thinking about adding an alternative global function in the firebase
namespace for those who wish to configure this value programmatically.
The Info.plist
solution should be part of our next release since that solution has already been merged to main
. The API solution might come a bit later.
@DellaBitta That sounds like amazing news, thank you so much! Looking forward to the next release.
[REQUIRED] Please fill in the following fields:
[REQUIRED] Please describe the issue here:
When running the Firebase C++ SDK in a sandboxed app on macOS 12.5, high CPU utilisation is observed after authentication succeeds. Without sandboxing enabled, the issue does not occur.
Steps to reproduce:
Follow these steps to set up a test project that reproduces the issue. Or use the attached test project below.
Expected outcome: The worker threads should use minimal CPU resources and sleep whenever possible.
Have you been able to reproduce this issue with just the Firebase C++ quickstarts ? What's the issue repro rate? (eg 100%, 1/5 etc)
Yes, happens with any sandboxed Mac app build (100%).
What happened? How can we make the problem occur? This could be a description, log/console output, etc.
As soon as the authentication succeeds, four Firebase worker keep running at full CPU utilisation. This is mostly due to
firebase::scheduler::Scheduler::WorkerThreadRoutine
. This can be verified using Instruments, when sampling the running app.Expected outcome: The worker threads should use minimal CPU resources and sleep whenever possible.