firebase / firebase-cpp-sdk

Firebase C++ SDK
http://firebase.google.com
Apache License 2.0
272 stars 111 forks source link

[Bug] High CPU utilisation after authentication when running in sandbox on desktop (macOS) #1045

Open infiniteloopunroll opened 1 year ago

infiniteloopunroll commented 1 year ago

[REQUIRED] Please fill in the following fields:

[REQUIRED] Please describe the issue here:

When running the Firebase C++ SDK in a sandboxed app on macOS 12.5, high CPU utilisation is observed after authentication succeeds. Without sandboxing enabled, the issue does not occur.

Steps to reproduce:

Follow these steps to set up a test project that reproduces the issue. Or use the attached test project below.

  1. Include Firebase Auth in a C++ or Objective-C++ class and implement an email + password authentication step.
  2. Create an Xcode project with the template macOS → Application → App (Interface XIB, in order to select language Objective-C). Add your C++/Objective-C++ class to this project and the app target.
  3. Under project → targets → $appname, add firebase_auth.framework (from fb_cpp_sdk/frameworks/darwin/universal/ in the pre-built C++ SDK folder) to the section "Frameworks, Libraries and Embedded Content", with the option Embed & Sign selected. Also add the other macOS dependencies here (CoreFoundation, Foundation, GSS, Kerberos, libpthread, Security and SystemConfiguration. Keep the standard value for embedding, Do not Embed).
  4. Check that the frameworks in fb_cpp_sdk/frameworks/darwin/universal/ either contain all headers (four header files are missing from the v9.3 download in fb_cpp_sdk/frameworks/darwin/universal/firebase.framework/Headers/) or manually add the project header search path for fb_cpp_sdk/include/.
  5. The project template contains the file $appname.entitlements. Make sure that the value for key "Sandboxing enabled" in this property list file is set to YES, in order to create a sandboxed app. Additionally, set both the com.apple.security.network.client and server values to YES, in order to allow network access.
  6. Build the app and run it, then sign in using email + password.
  7. As soon as the authentication succeeds, four Firebase worker threads keep running at full CPU utilisation. This can be checked in the debug navigator of Xcode when selecting CPU (displays about 400% usage continuously).

Expected outcome: The worker threads should use minimal CPU resources and sleep whenever possible.

  1. (Optionally) Go back to the entitlements file and set "Sandboxing enabled" to NO, in order to deactivate sandboxing. The issue should no longer occur after clean build products and rebuild in Xcode, when running the app again and signing in.

Have you been able to reproduce this issue with just the Firebase C++ quickstarts ? What's the issue repro rate? (eg 100%, 1/5 etc)

Yes, happens with any sandboxed Mac app build (100%).

What happened? How can we make the problem occur? This could be a description, log/console output, etc.

As soon as the authentication succeeds, four Firebase worker keep running at full CPU utilisation. This is mostly due to firebase::scheduler::Scheduler::WorkerThreadRoutine. This can be verified using Instruments, when sampling the running app.

Expected outcome: The worker threads should use minimal CPU resources and sleep whenever possible.

infiniteloopunroll commented 1 year ago

Here is a test project set up as described above, that allows for easy reproduction of the issue on Mac. high cpu usage test project.zip

chkuang-g commented 1 year ago

Hi @infiniteloopunroll

Thank you for reporting.

Does this occur to you if Sandboxing is NOT enabled?

The first thing come to my mind is that Firebase Auth SDK attempting to store the auth token in macOS Keychain in that scheduler thread. I am not sure if this would work properly in a sandbox environment. Are you able to enable keychain in this environment?

Another possibility is when Firestore is trying to store persistent data under ~/.firebase/ but unable to access it.

Could you verify if these are the case? Also, are you able to provide the full callstack of those threads using lldb and bt all?

Thank! Shawn

google-oss-bot commented 1 year ago

Hey @infiniteloopunroll. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

triplef commented 1 year ago

Thanks for your feedback @chkuang-g! Due to summer vacations we need a bit more time getting back to you on this. Please don't close just yet. 🙏

infiniteloopunroll commented 1 year ago

Hi @chkuang-g, thanks for reaching out for further information. This issue is very important to me.

Here is what I can tell you right now:

Unfortunately I can't provide the full call stack right now but will follow up in a couple of days, if it is still helpful.

Best, Robin

infiniteloopunroll commented 1 year ago

Hi @chkuang-g, as requested, I can now provide you with the following additional information:

1. Call stack (lldb bt all):

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000192dce8b0 libsystem_kernel.dylib`mach_msg_trap + 8
    frame #1: 0x0000000192dced20 libsystem_kernel.dylib`mach_msg + 76
    frame #2: 0x0000000192ed92c0 CoreFoundation`__CFRunLoopServiceMachPort + 372
    frame #3: 0x0000000192ed7770 CoreFoundation`__CFRunLoopRun + 1180
    frame #4: 0x0000000192ed6b34 CoreFoundation`CFRunLoopRunSpecific + 600
    frame #5: 0x000000019bb16338 HIToolbox`RunCurrentEventLoopInMode + 292
    frame #6: 0x000000019bb160b4 HIToolbox`ReceiveNextEventCommon + 564
    frame #7: 0x000000019bb15e68 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 72
    frame #8: 0x0000000195a3e51c AppKit`_DPSNextEvent + 860
    frame #9: 0x0000000195a3ce14 AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1328
    frame #10: 0x0000000195a2efe0 AppKit`-[NSApplication run] + 596
    frame #11: 0x0000000195a006fc AppKit`NSApplicationMain + 1132
    frame #12: 0x0000000104995c8c FirebaseCppTestApp`main(argc=3, argv=0x000000016b46b3c0) at main.m:14:12
    frame #13: 0x000000010591508c dyld`start + 520
  thread #8, name = 'com.apple.NSEventThread'
    frame #0: 0x0000000192dce8b0 libsystem_kernel.dylib`mach_msg_trap + 8
    frame #1: 0x0000000192dced20 libsystem_kernel.dylib`mach_msg + 76
    frame #2: 0x0000000192ed92c0 CoreFoundation`__CFRunLoopServiceMachPort + 372
    frame #3: 0x0000000192ed7770 CoreFoundation`__CFRunLoopRun + 1180
    frame #4: 0x0000000192ed6b34 CoreFoundation`CFRunLoopRunSpecific + 600
    frame #5: 0x0000000195bab100 AppKit`_NSEventThread + 196
    frame #6: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
  thread #13
    frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
    frame #1: 0x0000000104bf6054 FirebaseCppTestApp`firebase::rest::CurlThread::GetNextAction(firebase::rest::TransportCurlActionData*, long long) + 260
    frame #2: 0x0000000104bf6478 FirebaseCppTestApp`firebase::rest::CurlThread::ProcessRequests() + 428
    frame #3: 0x0000000104a1f314 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(void*), void*> >(void*) + 44
    frame #4: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
  thread #14
    frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
    frame #1: 0x0000000104a1e79c FirebaseCppTestApp`firebase::scheduler::Scheduler::WorkerThreadRoutine(void*) + 408
    frame #2: 0x0000000104a1f314 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(void*), void*> >(void*) + 44
    frame #3: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
  thread #15
    frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
    frame #1: 0x00000001049abc64 FirebaseCppTestApp`firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*) + 480
    frame #2: 0x00000001049b2020 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(firebase::auth::IdTokenRefreshThread*), firebase::auth::IdTokenRefreshThread*> >(void*) + 44
    frame #3: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
  thread #16
    frame #0: 0x0000000192dfd594 libsystem_kernel.dylib`sem_wait + 8
    frame #1: 0x0000000104a1e79c FirebaseCppTestApp`firebase::scheduler::Scheduler::WorkerThreadRoutine(void*) + 408
    frame #2: 0x0000000104a1f314 FirebaseCppTestApp`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(void*), void*> >(void*) + 44
    frame #3: 0x0000000105899890 libsystem_pthread.dylib`_pthread_start + 148
  thread #18
    frame #0: 0x00000001058a3a94 libsystem_pthread.dylib`start_wqthread

2. Heaviest stack trace (Instruments):

116178  FirebaseCppTestApp (18930)  
29015   firebase::auth::IdTokenRefreshThread::Initialize  0x27e3c   
29015   thread_start    
29015   _pthread_start  
29015   void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(firebase::auth::IdTokenRefreshThread*), firebase::auth::IdTokenRefreshThread*> >(void*)  
29015   firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*)   
24670   sem_wait    
340     cerror  
215     cerror_nocancel

3. Time profile (Instruments):

         Weight        Self Weight         Symbol Name
1.94 min    100.0%  0 s     FirebaseCppTestApp (18930)
29.02 s         24.9%   0 s      firebase::auth::IdTokenRefreshThread::Initialize  0x27e3c
29.02 s     24.9%   0 s       thread_start
29.02 s      24.9%  0 s        _pthread_start
29.02 s      24.9%  0 s         void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(firebase::auth::IdTokenRefreshThread*), firebase::auth::IdTokenRefreshThread*> >(void*)
29.02 s     24.9%   0 s          firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*)
24.67 s     21.2%   24.33 s           sem_wait
340.00 ms    0.2%   27.00 ms               cerror
215.00 ms    0.1%   215.00 ms               cerror_nocancel
60.00 ms    0.0%    60.00 ms                _pthread_exit_if_canceled
37.00 ms    0.0%    37.00 ms                _pthread_exit_if_canceled
1.00 ms    0.0%     1.00 ms             cerror
2.12 s    1.8%      179.00 ms             firebase::Mutex::Release()
2.08 s    1.7%      40.00 ms              firebase::Mutex::Acquire()
110.00 ms    0.0%   110.00 ms             firebase::auth::IdTokenRefreshThread::Initialize(firebase::auth::AuthData*)::$_3::__invoke(firebase::auth::IdTokenRefreshThread*)
33.00 ms    0.0%    33.00 ms              DYLD-STUB$$sem_wait
29.02 s   24.9%     0 s      firebase::scheduler::Scheduler::WorkerThreadRoutine  0x27e3d
28.97 s   24.9%     0 s      firebase::scheduler::Scheduler::WorkerThreadRoutine  0x27e3b
28.61 s   24.6%     0 s      firebase::rest::CurlThread::ProcessRequests  0x27e3a
434.00 ms    0.3%   0 s      Main Thread  0x27902
84.00 ms    0.0%    0 s      _NSEventThread  0x27ae7
16.00 ms    0.0%    0 s      _dispatch_workloop_worker_thread  0x27ae0
12.00 ms    0.0%    0 s      _dispatch_workloop_worker_thread  0x27d45
9.00 ms    0.0%     0 s      _dispatch_workloop_worker_thread  0x27ae4
6.00 ms    0.0%     0 s      _dispatch_worker_thread2  0x27dde
5.00 ms    0.0%     0 s      _dispatch_workloop_worker_thread  0x27e29
2.00 ms    0.0%     0 s      start_wqthread  0x27e27
1.00 ms    0.0%     0 s      f_b_getaddrinfo_thread  0x27e3e
1.00 ms    0.0%     0 s      _dispatch_workloop_worker_thread  0x27e41

Also find the complete Instruments time profile attached, as a .trace file bundle. I think this might be especially helpful, to look into the issue.

Thanks for your help! Looking forward to any feedback.

Best, Robin

time profile (Instruments).trace.zip

chkuang-g commented 1 year ago

Oh, this is incredibly useful. Thank you!

Based on your information, I think this happens when IdTokenRefreshThread is waiting for token refresh request. https://github.com/firebase/firebase-cpp-sdk/blob/3f79c788eb396b499e98f6f9192cdd9b4020cf00/auth/src/desktop/auth_desktop.cc#L694

This refresh request will need internet access in order to get the token from the auth backend. I assume you did enabled Outgoing connection (client) in the sandboxing setting, correct? https://developer.apple.com/documentation/xcode/configuring-the-macos-app-sandbox/

I have a feeling this is something weird going on with this Sandboxing mode. May I understand why you need to run macOS application in this mode?

triplef commented 1 year ago

May I understand why you need to run macOS application in this mode?

All software distributed in the Mac App Store must be sandboxed per Apple guidelines, and additionally there are some Apple APIs that can only be accessed if the app is running sandboxed (even for local test builds). This is why these days nearly all macOS software is sandboxed.

infiniteloopunroll commented 1 year ago

Hi @chkuang-g, thanks for looking into this issue further! It would be great to resolve this, because as @triplef has already explained, running apps in a sandbox on macOS today is the rule and not an exception, for many reasons. And I think it would be especially nice to address the problem, because some parts of the FB C++ SDK have already been prepared to work with Mac sandboxing (for example, library folder and keychain access) and there only seems to be a missing piece to make it fully work.

Networking entitlements are set in the demo project. Maybe also interesting to note that everything appears to work, even in sandboxed mode, except for the persistent high CPU utilisation by the four worker threads. For example, authentication still succeeds and data can be transferred to and from FB.

Please let me know if I can provide any further information to track down the issue. Super cool that you look into it! Would be so nice to resolve this.

Best, Robin

DellaBitta commented 1 year ago

@triplef To help us narrow this down, are you also running on a M1 Mac?

Thanks!

infiniteloopunroll commented 1 year ago

@DellaBitta Yes, this issue was observed when running on Apple Silicon ARM64, macOS 12.x. Please let me know it I can provide any other information. Thanks!

  • Problematic Firebase Component: Auth (Auth, Database, etc.)
  • Other Firebase Components in use: Firestore (Auth, Database, etc.)
  • Platform you are using the C++ SDK on: Mac ARM64 (Mac, Windows, or Linux)
  • Platform you are targeting: Desktop (iOS, Android, and/or desktop)
triplef commented 1 year ago

I can reproduce the same issue on Intel Mac running macOS 12.6.

DellaBitta commented 1 year ago

I was able to reproduce this on an intel mac as well. Thanks for the test case!

infiniteloopunroll commented 1 year ago

@DellaBitta Thanks for looking into this issue! It would be so great to get this resolved, so that the FB desktop SDK can be used on macOS. Is there any update you can give, as of today? Really appreciate your help! And let me know if there is anything else we can do to help.

Best, Robin

DellaBitta commented 1 year ago

Hi @infiniteloopunroll,

I've been looking into this issue and I believe I've found the problem.

When macOS Sandbox mode is enabled, macOS requires that semaphores have a name that is prefixed by the App's Group Name. If the semaphore's name doesn't match this convention then its creation fails.

As an additional issue, we attempted to detect semaphore creation errors by comparing the semaphore handle to nil. But in the the case of macOS, a a semaphore creation error returns SEM_FAILED which is 0xFFFFFFFFFFFFFFFF, not nil. We therefore didn't detect these creation errors and started to use the semaphore anyway. Subsequent sem_wait calls would fail, too, which is why we saw tight looping and high CPU usage in our worker threads. They were just constantly spinning.

Unfortunately there's no official way for the SDK to query the app's group name at runtime, so we can't automatically mangle the semaphore names.

Instead I'm thinking that the Firebase SDK could use a property in your Info.plist, something that you could add like FBAppGroupEntitlementName for instance. If that property is present then the SDK would use it to prefix the semaphore names. This way you could plumb the App Group name to the SDK, and the semaphores would be created properly.

Would this be a good way to solve the problem for you?

Also, it turns out the network sandbox entitlements com.apple.security.network.client and ...server values aren't related to this problem. That is, I can reproduce this issue without configuring them. If someone encounters an issue when these are set then please let me know!

infiniteloopunroll commented 1 year ago

Hi @DellaBitta, thanks for looking into this issue and identifying the problem! That sounds super interesting and I can see how it would lead to issues when setting up the semaphore. Your proposed solution looks great as well, we think that would make it really easy for macOS developers to add the app’s group name in a way that you can read from the FB SDK. Would it in your view make sense to add the app group name to firebase::AppOptions as well? That might add a little more flexibility for developers, but having it read directly from the plist sounds great. Really looking forward to testing this new solution and assisting you with any further information that might be helpful, please just let me know if there is anything else that I can provide. Best, Robin

DellaBitta commented 1 year ago

Hi @infiniteloopunroll,

We thought adding a field to AppOptions but in theory, if you initialize the default app and then later create a custom app with AppOptions, it would already be too late to configure the semaphore names. Or, alternatively, subsequent calls to create an App with different AppOptions wouldn't affect the name of the semaphores already created.

We'd rather have a solution that doesn't make it seem like the configuration can vary across instances of Firebase Apps.

That being said, we're also thinking about adding an alternative global function in the firebase namespace for those who wish to configure this value programmatically.

The Info.plist solution should be part of our next release since that solution has already been merged to main. The API solution might come a bit later.

infiniteloopunroll commented 1 year ago

@DellaBitta That sounds like amazing news, thank you so much! Looking forward to the next release.