firebase / firebase-ios-sdk

Firebase SDK for Apple App Development
https://firebase.google.com
Apache License 2.0
5.68k stars 1.49k forks source link

Realtime Database stays offline and doesn't reconnect after sleeping #9682

Open fbarbat opened 2 years ago

fbarbat commented 2 years ago

[REQUIRED] Step 1: Describe your environment

[REQUIRED] Step 2: Describe the problem

Steps to reproduce: Have the mac going to sleep. After waking up, sometimes (with really low frequency), the Realtime Database client would stay offline indefinitely, even if network conditions are good.

We have an app that uses Realtime Database for implementing Presence in Firestore. https://firebase.google.com/docs/firestore/solutions/presence. If we have the app running and the mac goes to sleep, sometimes, after waking up, the Realtime Database client doesn't reconnect.

We have strong evidence that database.reference(withPath: ".info/connected").observe reported a false value while the mac was sleeping but it never reported true again, even after waking up and having good network conditions.

If we use any operation while Realtime database is offline, like statusReference.setValue to write to the database, the operation never runs the completion callback.

Firestore is able to reconnect properly without problems but Realtime Database client is not working properly.

We saw one case that recovered by itself after a some hours (12 hours after disconnected, 2 hours after the mac woke up). But as we need the Realtime Database to implement Presence, that delay is not acceptable.

Restarting the app makes it work again.

karlsc commented 2 years ago

We have a similar issue since iOS15. It seems to work 99% of the time, but once in a while, the reconnection does not happen.

fbarbat commented 2 years ago

This keeps happening to us. It's been like 3 times I hit this myself since I reported the issue. Our users are hitting it a lot too.

morganchen12 commented 2 years ago

Hey all, apologies for the slow response. Do you have any reliable repro steps or are you able to enable logging and share the logs when this bug occurs?

RichardSPrins commented 2 years ago

This is happening to me as well, running locally at the moment, haven't tested this issue in prod, but every time my macbook sleeps, the realtime connection is ended and I have to refresh the browser to reconnect. I am using for user presence as well so it becomes an issue when any mac users in my app encounter this bug expecting to see others online to communicate.

ghost commented 2 years ago

Hey @morganchen12, sorry about the delayed response on my side this time. I couldn't reproduce it myself but we have reports that it is still happening to users. Unfortunately we don't have any reliable repro steps. When you say enable logging is just doing Database.setLoggingEnabled(true)? I will try that. Let me know if there's anything else we can try. Thanks.

morganchen12 commented 2 years ago

Yep, or by calling this method.

ghost commented 2 years ago

Ok, I could reproduce it and collect logs and other information!

In my experience, it is more likely to hit this issue on Monday mornings, that is, after having my Mac sleeping during the weekend with the app running. So last Friday at the end of the day I set up debug mode for Realtime Database, started recording logs left the app running attached to Xcode and put my mac to sleep.

Yesterday morning, while the mac was sleeping the app tried reconnecting to the Realtime Database. This behavior is normal as far as I know: even if the computer is "sleeping" the app would wake app every now and then for some seconds and try to reconnect to servers because it doesn't know the mac "is actually sleeping". Some seconds later, it actually goes to sleep so the app stop running.

2022-06-05 08:44:21.249940-0300 0xdc7174   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082014] Realtime connection established
2022-06-05 08:44:21.251504-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034003] On ready
2022-06-05 08:44:21.251590-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034041] Handling timestamp: 1654429461722
2022-06-05 08:44:21.300121-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034033] Calling restore state
2022-06-05 08:44:21.300225-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034035] Restoring auth
2022-06-05 08:44:21.300324-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082004] Sending data (contents hidden)
2022-06-05 08:44:21.300432-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB065002] Raising value event on connected
2022-06-05 08:44:21.463667-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083006] (wsc:29) handleNewFrameCount: 1
2022-06-05 08:44:21.463730-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083007] (wsc:29) handleIncomingFrame sending complete frame: 0
2022-06-05 08:44:21.463794-0300 0xdc7c99   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082010] Got data message: {
    [SOME STUFF]
}
2022-06-05 10:07:56.067777-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082004] Sending data (contents hidden)
2022-06-05 10:07:56.069026-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083014] (wsc:29) resetting keepalive, to 2022-06-05 13:08:41 +0000 ; old: 2022-06-05 11:45:06 +0000

The problem is that, during these spontaneous wake ups, there are high chances that we are not connected to the network, so connection issues might happen:

2022-06-05 10:07:56.108435-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083010] (wsc:29) didFailWithError didFailWithError: Error Domain=NSPOSIXErrorDomain Code=54 "Connection reset by peer" UserInfo={_kCFStreamErrorCodeKey=54, _kCFStreamErrorDomainKey=1}
2022-06-05 10:07:56.108510-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083013] Websocket is closing itself
2022-06-05 10:07:56.108573-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082007] Realtime connection lost.
2022-06-05 10:07:56.108632-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082002] Closing realtime connection.
2022-06-05 10:07:56.109244-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034004] Got on disconnect due to other
2022-06-05 10:07:56.109370-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034008] Scheduling connection attempt
2022-06-05 10:07:56.109444-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB065002] Raising value event on connected
2022-06-05 10:07:56.109504-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB038019] Running onDisconnectEvents
2022-06-05 10:07:56.109564-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034009] Trying to fetch auth token
2022-06-05 10:07:56.109706-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034011] Successfully fetched token, opening connection
2022-06-05 10:07:56.109764-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083001] (wsc:30) Connecting to: wss://s-usc1f-nss-2565.firebaseio.com/.ws?somestuff
2022-06-05 10:07:56.109829-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082001] Calling open in FConnection
2022-06-05 10:07:56.109893-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083002] (wsc:30) FWebSocketConnection open.

As there were network problems, opening this connection timed out. And the Realtime Database client hanged here.

2022-06-05 11:47:00.281060-0300 0xdc79c6   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083012] (wsc:30) Websocket timed out on connect

Today I started my workday, opened my mac and this got logged.

2022-06-06 11:17:09.814642-0300 0xdcd704   Debug       0x285b42c            16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB065002] Raising value event on connected

I checked server side state and confirmed this issue was happening: I was offline for the rest of the world but my app was running and had a good Internet connection.

Using an internal debugging feature I tried to interact with the Realtime Database, and this is what I got:

2022-06-06 11:24:46.647367-0300 0xdce6c2   Debug       0x285cb8a            16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB038003] Setting: /some_document with {
    value = 1;
} pri: <null>
2022-06-06 11:24:46.655124-0300 0xdce6c2   Debug       0x285cb8a            16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034025] Wasn't connected or writes paused, so added to outstanding puts only. Path: /some_document

This confirmed previous observations on this issue. The Realtime Database client was "stuck". Using Xcode, I checked two Realtime Database client internal bits of state:

So it looks like the app hanged in this ConnectionStateConnecting state after the network timeout. I checked the client code and, from what I could see, the Realtime Database client doesn't have any retry logic in place to recover from these errors.

In the past, I had seen the app recovering itself after several hours so I'm still waiting to collect logs in case that could be helpful. I still have Xcode attached so if you want to debug it with me now, let me know. :)

I have more logs if that might be useful to you but they have some (slightly) sensitive data so I am fine with sending them to you directly instead of attaching them here publicly if you need them.

Thank you!

ghost commented 2 years ago

Ok, after around 2 hours, the app reconnected. These are the logs:

2022-06-06 13:08:25.944123-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083010] (wsc:30) didFailWithError didFailWithError: Error Domain=NSOSStatusErrorDomain Code=-9806 "errSSLClosedAbort: connection closed via error " UserInfo={_kCFStreamErrorCodeKey=-9806, _kCFStreamErrorDomainKey=3}
2022-06-06 13:08:25.944279-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083013] Websocket is closing itself
2022-06-06 13:08:25.944348-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082006] Realtime connection failed.
2022-06-06 13:08:25.944701-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082002] Closing realtime connection.
2022-06-06 13:08:25.944767-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034004] Got on disconnect due to other
2022-06-06 13:08:25.944840-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034008] Scheduling connection attempt
2022-06-06 13:08:25.944899-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB054002] Scheduling retry in 0.618143s
2022-06-06 13:08:26.098224-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB038019] Running onDisconnectEvents
2022-06-06 13:08:26.620248-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034009] Trying to fetch auth token
2022-06-06 13:08:26.621511-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB034011] Successfully fetched token, opening connection
2022-06-06 13:08:26.622945-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083001] (wsc:32) Connecting to: wss://x.firebaseio.com/.ws?somestuff
2022-06-06 13:08:26.625894-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082001] Calling open in FConnection
2022-06-06 13:08:26.626043-0300 0xdd7096   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083002] (wsc:32) FWebSocketConnection open.
2022-06-06 13:08:27.733870-0300 0xdd7085   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083008] (wsc:32) webSocketDidOpen
2022-06-06 13:08:36.556371-0300 0xdd7085   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083009] (wsc:32) nop timer kicked off
2022-06-06 13:08:36.556481-0300 0xdd7085   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083006] (wsc:32) handleNewFrameCount: 1
2022-06-06 13:08:36.556550-0300 0xdd7085   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB083007] (wsc:32) handleIncomingFrame sending complete frame: 0
2022-06-06 13:08:36.556614-0300 0xdd7085   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082011] Got control message: { some stuff}
2022-06-06 13:08:36.557386-0300 0xdd7085   Debug       0x0                  16384  0    MyAppName: (GoogleUtilities) 8.15.0 - [Firebase/Database][I-RDB082014] Realtime connection established

Then it properly connected again.

Note I had a breakpoint so that might explain the delay in 13:08:36.

Let me know if you need more information. Thanks!

morganchen12 commented 2 years ago

Thanks, it sounds like the right way to fix this is to just retry on socket connection failures, respecting the NSWorkspace state. I'll take a crack at implementing this when I have the bandwidth.

ghost commented 2 years ago

Thanks!

You might also consider the timeout case. I think the socket error in this case was 2 hours late so it's not safe to wait for it unless the SDK can force it to happen earlier somehow.

Alternatively, could the SDK discard the timed out web socket (and future notifications) and try a new one?

morganchen12 commented 2 years ago

Maybe it'd be better to deliberately terminate the socket connection on sleep and restore it on wakeup then? @IanWyszynski what do you think?

ghost commented 2 years ago

Thanks! In my opinion, terminating the connection on sleep and restore it on wake up might reduce the chances of this from happening, but it doesn't fix the root problem. The main problem is that the SDK doesn't retry after the timeout (it just logs an error), it waits for the socket error and that might take a lot of time (if it actually happens). I think terminating the connection on sleep and restore it on wake up might mitigate it though so it's a good workaround we can do on our side in the meantime. We are going to implement this behavior to try to reduce the chances of it from happening by invoking goOffline when the computer goes to sleep, and invoking goOnline when the computer wakes up. This way, network problems happening while the computer is sleeping shouldn't affect us. Having said that, this won't fix the root problem since network issues might happen anyway when the computer is awake, specifically when starting the app and the SDK wants to setup the connection, so it would be great if the SDK can handle connection timeouts and retries more gracefully.

markterrill commented 1 year ago

Hi, this issue is more widespread than just macos, it definitely happens on iOS as well in ionic cordova apps (environment info below).

A couple observations and research items of note that may help:

It's quite a bit of effort to setup a new 'minimal' repo, ionic, angularfire and a subscription to a listval with detection events on resume/background. If someone else has that handy then go for it, I hope this post helps someone else so they don't spend days searching and experimenting!

Note: using modular angularfire.

ENVIRONMENT

Angular:
Angular CLI: 13.3.10
Node: 14.20.0
Package Manager: yarn 1.22.15
OS: darwin x64

Angular: 13.3.12
... animations, common, compiler, compiler-cli, core, forms
... language-service, platform-browser, platform-browser-dynamic
... router, service-worker

Package                         Version
---------------------------------------------------------
@angular-devkit/architect       0.1303.6
@angular-devkit/build-angular   13.3.10
@angular-devkit/core            13.3.10
@angular-devkit/schematics      13.3.10
@angular/cli                    13.3.10
@angular/fire                   7.5.0
@schematics/angular             13.3.10
rxjs                            6.6.7
typescript                      4.6.2
webpack                         5.70.0

Ionic:

   Ionic CLI                     : 6.20.3 (/Users/markterrill/.config/yarn/global/node_modules/@ionic/cli)
   Ionic Framework               : @ionic/angular 6.2.4
   @angular-devkit/build-angular : 13.3.10
   @angular-devkit/schematics    : 13.3.10
   @angular/cli                  : 13.3.10
   @ionic/angular-toolkit        : 3.1.1

Capacitor:

   Capacitor CLI      : 4.5.0
   @capacitor/android : 4.5.0
   @capacitor/core    : 4.1.0
   @capacitor/ios     : 4.5.0

Cordova:

   Cordova CLI       : not installed
   Cordova Platforms : not available
   Cordova Plugins   : not available

Utility:

   cordova-res (update available: 0.15.4) : 0.15.3
   native-run                             : 1.7.1

System:

   Android SDK Tools : 26.1.1 (/Users/markterrill/Library/Android/sdk)
   NodeJS            : v14.20.0 (/Users/markterrill/.nvm/versions/node/v14.20.0/bin/node)
   npm               : 6.14.17
   OS                : macOS Monterey
   Xcode             : Xcode 14.2 Build version 14C18
emilsaj commented 1 year ago

How is this still an issue? Seems to be a pretty critical bug imo...

nCr78 commented 1 year ago

Hello, this is still an issue on both Firebase and Firestore. Not only that, but, it's found in the web & Android versions, not just iOS.

Do we have to implement a watchdog or an app lifecycle event ourselves for something this critical? There have been countless times where I get a push notification while the app is on the background / sleep, I open it and the data from Firestore never arrives without restarting the app.

n0tc0ding commented 1 year ago

this cost me a lot of negative reviews on my app, I guess we need to remove rtdb. I tried @markterrill solution listening the lifecycle of the app but it doesn't work, rtdb never recovers and it doesn't show an error or something in debug window.

ablbol commented 1 year ago

In my case, when the app goes offline then online the listener to ".info/connected" is never called again. This means when the app reconnects to WiFi again, my app is not able to update its online status again in the realtime database.

UZ8VLqifK36Z commented 10 months ago

Hello, I had a similar problem while using Firebase web modular API with Capacitor on iOS. Each time a tried a write operation on the database (with set()), and a network change occurred meanwhile, the database API got stuck forever. My listener to ".info/connected" was never called. I think a found a workaround:

import { Injectable } from '@angular/core';
import { getApp } from 'firebase/app';
import { getDatabase, goOffline, goOnline, onValue, ref } from 'firebase/database';
import { Network } from '@capacitor/network';

@Injectable({
  providedIn: 'root'
})
export class DatabaseService {

  public connectedToDatabase: boolean;

  private app = getApp();

  private database = getDatabase(this.app, "...firebasedatabase.app/");

  constructor() {
    // Get network status changes
    // Network status changes when switching between wifi, cellular and offline
    Network.addListener('networkStatusChange', status => {
      console.log('Network status changed', status);
      if (status.connected == true) {
        // We need to go offline first, because realtime database can get stuck when switching between wifi and cellular
        goOffline(this.database);
        goOnline(this.database);
      }
      if (status.connected == false) {
        goOffline(this.database);
      }
    });

    // Now the callback on connectedRef is triggered at each network change
    // And write operations like (set()) don't get stuck anymore
    const connectedRef = ref(this.database, ".info/connected");
    onValue(connectedRef, (snap) => {
      if (snap.val() === true) {
        this.connectedToDatabase = true;
      } else {
        this.connectedToDatabase = false;
      }
    });
  }
}
liquidvisual commented 10 months ago

@UZ8VLqifK36Z Thanks for sharing. I'm going to give this a try at some point with my React app. I just discovered this window event, which hopefully works the same.

// https://developer.mozilla.org/en-US/docs/Web/API/Navigator/onLine

window.addEventListener("offline", () => {});
window.addEventListener("online", () => {});
tolypash commented 2 weeks ago

Hey there, having the same issue here, react native firebase though. I believe it's related.

paulb777 commented 2 weeks ago

Does Firebase 11.x make a difference?

A new socket implementation was introduced in 10.27.0

This issue may be related to #13877

tolypash commented 2 weeks ago

I'll try downgrade my version before 10.27.0 and see if that improves things :)

Issue does seem related!

google-oss-bot commented 1 week ago

Hey @fbarbat. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!