GetStream / stream-chat-swift

💬 iOS Chat SDK in Swift - Build your own app chat experience for iOS using the official Stream Chat API
https://getstream.io/chat/sdk/ios/
Other
859 stars 211 forks source link

Socket Connection Drops Off and Never Reconnects #1920

Closed davefoxy closed 2 years ago

davefoxy commented 2 years ago

What did you do?

Opening a connection to ChatClient using connectUser and providing a tokenProvider. After a time, I started to get the following in the terminal and a chat is no longer updated:

Task <33946E60-ED40-4412-9FF0-8B6EE6F2BED2>.<1> finished with error [57] Error Domain=NSPOSIXErrorDomain Code=57 "Socket is not connected" UserInfo={NSErrorFailingURLStringKey=wss://chat-proxy-us-east.stream-io-api.com/connect?api_key=[redacted]&json=%7B%22server_determines_connection_id%22:true,%22user_id%22:%22[redacted]%22,%22user_details%22:[redacted]%22,%22name%22:%22[redacted]%22,%22image%22:%22[redacted]%22%7D%7D, NSErrorFailingURLKey=wss://chat-proxy-us-east.stream-io-api.com/connect?api_key=[redacted]&json=%7B%22server_determines_connection_id%22:true,%22user_id%22:[redacted]%22,%22user_details%22:%7B%22id%22:%22[redacted]%22,%22name%22:%22[redacted]%22,%22image%22:%22[refacted]%22%7D%7D, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalWebSocketTask <33946E60-ED40-4412-9FF0-8B6EE6F2BED2>.<1>"
), _NSURLErrorFailingURLSessionTaskErrorKey=LocalWebSocketTask <33946E60-ED40-4412-9FF0-8B6EE6F2BED2>.<1>}

What did you expect to happen?

If this is due to the token expiring, I expected my tokenProvider to be called but it doesn't look like it is. The whole "expiring token" thing might be a red herring though.

What happened instead?

The above error is output and any open channels stop receiving real-time messages. Calling synchronize on its channel controller will reload it but we don't get real-time messages coming back.

GetStream Environment

GetStream Chat version: 4.13.1 GetStream Chat frameworks: StreamChat, StreamChatSwiftUI iOS version: 15.4 Swift version: 5 Xcode version: 13.3 Device: Simulator and iPhone 13 Pro

Additional context

Here's my connection code:

var clientConfig = ChatClientConfig(apiKey: .init(streamAPIKey))
clientConfig.applicationGroupIdentifier = "[redacted]"
self.chatClient = ChatClient(config: clientConfig)

// Create the Stream chat SwiftUI object.
self.streamChat = StreamChat(chatClient: chatClient)

// Inform the Stream client of how to fetch new tokens if and when the current one expires.
chatClient.tokenProvider = { [weak self] completion in    
    self?.fetchStreamToken(resultHandler: { result in
        switch result {
        case let .failure(error):
            self?.connectionStatus = .error(error)
        case .success:
            self?.connectionStatus = .connected
        }

        completion(result)
    })
}

fetchStreamToken is just a simple network call that returns a validated Token. However, as mentioned above, the above tokenProvider is never called.

Our initial connection code:

self?.chatClient.connectUser(
    userInfo: .init(id: [redacted],
                    name: [redacted],
                    imageURL: [redacted]),
    token: token
) { error in
    DispatchQueue.main.async {
        if let error = error {
            self?.connectionStatus = .error(StreamTokenError.streamAPIConnectionError(error))
        } else {
            self?.connectionStatus = .connected
        }
    }
}

One more thing; we are using Apollo iOS (GraphQL client) version 0.51.0. I'm not sure it makes a difference but this library has its own instance of StarScream included.

The initial connection seems totally ok, it's just reconnecting. Hoping to get a solution soon. Thanks.

Update

Trying to find a workaround, I was wondering if maybe I can just observe the connection status and refresh the token manually when it disconnects. I see this in ChatClient:

/// The current connection status of the client.
///
/// To observe changes in the connection status, create an instance of `CurrentChatUserController`, and use it to receive
/// callbacks when the connection status changes.
///
public internal(set) var connectionStatus: ConnectionStatus = .initialized

So I did as the comment says but there doesn't seem to be an observable for the connection status on CurrentChatUserController. Just ones for currentUserChangePublisher and unreadCountPublisher.

bielikb commented 2 years ago

Hi @davefoxy,

thanks for the report. We will try to reproduce the issue you're experiencing and will come back with a follow-up.

Best, Boris

bielikb commented 2 years ago

Hey, @davefoxy

could you follow this doc that discusses when & how to pass in the tokens to our client? eg:

  1. pass in the tokenProvider during ChatClient's initialization
  2. prepare & pass cached Token to connectUser API. NOTE: the tokenProvider closure passed in the init will be called everytime the given token expires.
  3. Hooking on the connection status / mutating the ConnectionStatus for your own purposes is not recommended and can lead to more bugs. Our client leverages reconnection strategy under the hood that will try to keep your user connected to the chat.

Please come back to us and tell us if the above helped you to resolve your issue.

Thanks.

davefoxy commented 2 years ago

Hi @bielikb thanks for your reply 🙇

So the code path should be the same between me giving the token provider through the tokenProvider property (as in my code above) and passing it in whilst instantiating ChatClient. However, I've tried your suggestion just in case but there's still no change.

The error message is actually coming from here: https://github.com/GetStream/stream-chat-swift/blob/develop/Sources/StreamChat/WebSocketClient/Engine/URLSessionWebSocketEngine.swift#L78

So there's no reconnection happening at this point. The code just logs an error and does nothing.

Is it possible maybe it's not token expiration that's causing this?

Also, about this point:

Hooking on the connection status / mutating the ConnectionStatus for your own purposes is not recommended and can lead to more bugs. Our client leverages reconnection strategy under the hood that will try to keep your user connected to the chat.

The connectionStatus part of my original code is not manipulating Stream's connectionStatus. This is our own enumerable for keeping track of overall setup. It's not touching Stream. However, I would be curious how we can monitor Stream's internal connectionStatus properly. I mentioned this in the original post under the "Update" section.

davefoxy commented 2 years ago

One last thing; idling on an open channel, I've just received this error for the first time:

[ERROR] [com.apple.NSURLSession-delegate] [RequestDecoder.swift:64] [decodeRequestResponse(data:response:error:)] > API request failed with status code: 400, code: 4 response:
{
  "code" : 4,
  "message" : "GetOrCreateChannel failed with error: \"Watch or Presence requires an active websocket connection, please make sure to include your websocket connection_id\"",
  "more_info" : "https:\/\/getstream.io\/chat\/docs\/api_errors_response",
  "StatusCode" : 400,
  "duration" : "0.00ms"
})
bielikb commented 2 years ago

Hi @bielikb thanks for your reply 🙇

So the code path should be the same between me giving the token provider through the tokenProvider property (as in my code above) and passing it in whilst instantiating ChatClient. However, I've tried your suggestion just in case but there's still no change.

The error message is actually coming from here: https://github.com/GetStream/stream-chat-swift/blob/develop/Sources/StreamChat/WebSocketClient/Engine/URLSessionWebSocketEngine.swift#L78

So there's no reconnection happening at this point. The code just logs an error and does nothing.

Is it possible maybe it's not token expiration that's causing this?

When leveraging tokenProvider are you able to connect and eg show the list of existing channels in your app? Does it successfully connect? Could you try to reproduce your issue leveraging our DemoApp on our main repo?

Also, about this point:

Hooking on the connection status / mutating the ConnectionStatus for your own purposes is not recommended and can lead to more bugs. Our client leverages reconnection strategy under the hood that will try to keep your user connected to the chat.

The connectionStatus part of my original code is not manipulating Stream's connectionStatus. This is our own enumerable for keeping track of overall setup. It's not touching Stream. However, I would be curious how we can monitor Stream's internal connectionStatus properly. I mentioned this in the original post under the "Update" section.

You can observe connectionStatus changes via adding ChatConnectionControllerDelegate conformance to your class + setting your instance as the delegate of ChatConnectionController.

class YouClass: ChatConnectionControllerDelegate { 
    var connectionController: ChatConnectionController?

    /// your setup code
    func setupClient() {
        ...
        // once the client is initialised
        connectionController  = chatClient.connectionController()
        connectionController.delegate = self
    }

    /// ChatConnectionControllerDelegate conformance    
    func connectionController(_ controller: ChatConnectionController, 
                              didUpdateConnectionStatus status: ConnectionStatus) { 
         // observe changes
    } 
} 

as shown here in these docs.

davefoxy commented 2 years ago

When leveraging tokenProvider are you able to connect and eg show the list of existing channels in your app? Does it successfully connect? Could you try to reproduce your issue leveraging our DemoApp on our main repo?

Yes, I can retrieve all my channels and navigate to them, send messages etc. It just stops working after my token expiration time (15 minutes). I'll try out the demo app once again and double-check it works ok for me.

However, shouldn't there be something here to handle this error? It's just logging right now: https://github.com/GetStream/stream-chat-swift/blob/develop/Sources/StreamChat/WebSocketClient/Engine/URLSessionWebSocketEngine.swift#L78

Update: I checked out the demo app again... It doesn't seem to use a tokenProvider at all: https://github.com/GetStream/stream-chat-swift/blob/develop/DemoApp/DemoAppCoordinator.swift#L92

bielikb commented 2 years ago

However, shouldn't there be something here to handle this error? It's just logging right now: https://github.com/GetStream/stream-chat-swift/blob/develop/Sources/StreamChat/WebSocketClient/Engine/URLSessionWebSocketEngine.swift#L78

Ill create internally task for us. Thanks for pointing that out ;)

Update: I checked out the demo app again... It doesn't seem to use a tokenProvider at all: https://github.com/GetStream/stream-chat-swift/blob/develop/DemoApp/DemoAppCoordinator.swift#L92

Yes, that's correct. Feel free to adjust the init call to match your integration/initialization.

tbarbugli commented 2 years ago

@davefoxy trying to reproduce this on my end but token renewal seems to work fine. Do you mind sharing the latest version of the code that you are using?

To make things a bit simpler to debug, I create token on the app directly and simulate a delay in between.

The following code is absolutely not suitable for a production app but it might help you reducing the scope of the problem. The JWT token is created directly in iOS and expires after 15 seconds (+ we fake a 3 second delay to observe the token renew state)

  chatClient.shared.tokenProvider = { completion in
      DispatchQueue.main.asyncAfter(deadline: .now() + 3) {
          let token = generateUserToken(secret: apiKeySecretString, userID: userID, exp: Int(Date().timeIntervalSince1970) + 15)
          completion(.success(token))
      }
  }

and this is the code I use to generate token with a short expiration time:

import Foundation
import CryptoKit
import StreamChat

extension Data {
    func urlSafeBase64EncodedString() -> String {
        return base64EncodedString()
            .replacingOccurrences(of: "+", with: "-")
            .replacingOccurrences(of: "/", with: "_")
            .replacingOccurrences(of: "=", with: "")
    }
}

struct Header: Encodable {
    let alg = "HS256"
    let typ = "JWT"
}

struct JWTPayload: Encodable {
    let user_id: String
    let exp:Int
}

// DO NOT USE THIS FOR REAL APPS! This function is only here to make it easier to
// have expired token renewal while using the standalone demo application
func generateUserToken(secret: String, userID: String, exp: Int) -> Token {
    let privateKey = SymmetricKey(data: secret.data(using: .utf8)!)

    let headerJSONData = try! JSONEncoder().encode(Header())
    let headerBase64String = headerJSONData.urlSafeBase64EncodedString()

    let payloadJSONData = try! JSONEncoder().encode(JWTPayload(user_id: userID, exp: exp))
    let payloadBase64String = payloadJSONData.urlSafeBase64EncodedString()

    let toSign = (headerBase64String + "." + payloadBase64String).data(using: .utf8)!
    let signature = HMAC<SHA256>.authenticationCode(for: toSign, using: privateKey)
    let signatureBase64String = Data(signature).urlSafeBase64EncodedString()

    let token = [headerBase64String, payloadBase64String, signatureBase64String].joined(separator: ".")
    return try! Token.init(rawValue: token)
}
davefoxy commented 2 years ago

@tbarbugli Thanks! That's super-useful for debugging. Let me play with it a little bit and see. My code hasn't changed since what's in the initial message of this issue but at least this code you've provided will allow me to remove the variable that is our own token fetching code.

I'll get back to you once I experiment a bit more. It's approaching the weekend here so it might be some time before I get back to you 🙇

tbarbugli commented 2 years ago

@davefoxy how is it going with this? happy to move this to a quick call if that helps

davefoxy commented 2 years ago

@tbarbugli Sorry I'm a little late getting back to you on this. After the first message in this thread, I implemented our own token refreshing code but obviously being able to use tokenProvider is preferable. I'd like to just check a few other things in our codebase socket-related that might be causing an issue and I'll get back to you a little later this week on it. If I'm still struggling then yes, a call would be fantastic. Thanks for offering that as an option 🙇

tbarbugli commented 2 years ago

@davefoxy how is this going for you? AFAICT implementing your own token refresh + reconnect chat is very tricky. The SDK does some smart things around token expiration such as queuing requests and replay them when a fresh token is available.

How does your implementation look like?

davefoxy commented 2 years ago

Hi @tbarbugli so the process I have in place is:

  1. First I fetch our token, store it in a variable activeToken and make the initial connectUser call.
  2. Create a EventsController and listen for its delegate's didReceiveEvent method. Note here that I tried to use ConnectionController but its delegate methods weren't being fired.
  3. When I receive a ConnectionStatusUpdated and it's of the disconnected type, I check to see if the stored activeToken is expired or not and if it is, fetch a new one and call ChatClient's setToken method.

It seems to work ok but yes, as you said, there's a lot of potential edge cases that might be missed.

Maybe a call would be best but yes, I'd like to just draw your attention one more time to this message: https://github.com/GetStream/stream-chat-swift/issues/1920#issuecomment-1099251936.

The error we're receiving is this specific line and it doesn't refresh the token or do anything but print an error: https://github.com/GetStream/stream-chat-swift/blob/develop/Sources/StreamChat/WebSocketClient/Engine/URLSessionWebSocketEngine.swift#L85

bielikb has said he's logged an issue on your end but I'm not sure of the progress.

davefoxy commented 2 years ago

@tbarbugli Any progress on the above message and refreshing tokens within URLSessionWebSocketEngine?

polqf commented 2 years ago

Hi @davefoxy, Sorry to keep you waiting. We're working on a fix for this but cannot give you an ETA yet.

We'll keep you posted. Thank you!

polqf commented 2 years ago

Hi @davefoxy ,

First things first. The line in URLSessionWebSocketEngine that you pointed out should have nothing to do with this issue.

An error coming from the websockets comes with so little information that those issues are actioned through messages instead. Whenever there is a problem with the token, we would receive first a success event that contains a message similar to the following one:

{\"error\":{\"code\":40,\"message\":\"JWTAuth error: token is expired (exp)\",\"StatusCode\":401,\"duration\":\"\",\"more_info\":\"\"}}"

This basically tells us that there is an issue with the token, and that we should refresh it.

If you follow the path, this would end up calling ChatClient.webSocketClient(_: didUpdateConnectionState:). In here, as you can see in the following chunk, we refresh the token.

        case let .disconnected(source):
            if let error = source.serverError,
               error.isInvalidTokenError {
                refreshToken(completion: nil)
                shouldNotifyConnectionIdWaiters = false
            } else {
                shouldNotifyConnectionIdWaiters = true
            }
            connectionId = nil

Whenever the token is refreshed we recreate the websocket connection, which leads to a call to ChatClientUpdater.connect(userInfo:completion:).

I hope this helps you visuallize the flow, and find if there are any differences you are having in it.


That said, while investigating this case, I found out one issue. In this case, the refreshToken function in ChatClient might be executed twice. The first time because of the disconnection of the websocket, and the following one because of a failure coming from the APIClient. But in any case, this has never been an issue during my tests.

After verifying it further, it is only happening as an edge case, and should not be the root cause of your issue.

Please let us know if the flow stated above is the same as the one you have.

davefoxy commented 2 years ago

@polqf Thanks for the update. So yes, when the socket connection disconnects, I am falling into the disconnected state as you mentioned above. However, the refresh never happens because error.isInvalidTokenError is false (Ref: https://github.com/GetStream/stream-chat-swift/blob/develop/Sources/StreamChat/ChatClient.swift#L642)

Here is the result when I po source.serverError:

WebSocketEngineError(reason: "The operation couldn’t be completed. Socket is not connected", code: 57, engineError: Optional(Error Domain=NSPOSIXErrorDomain Code=57 "Socket is not connected" UserInfo={NSErrorFailingURLStringKey=wss://chat-proxy-us-east.stream-io-api.com/connect?api_key=[redacted]&json=%7B%22user_details%22:[redacted],%22server_determines_connection_id%22:true,%22user_id%22:[redacted], NSErrorFailingURLKey=wss://chat-proxy-us-east.stream-io-api.com/connect?api_key=[redacted]&json=%7B%22user_details%22:[redacted],%22server_determines_connection_id%22:true,%22user_id%22:[redacted], _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalWebSocketTask <70F87855-A1EA-409E-94DD-36862C08EC03>.<1>"
), _NSURLErrorFailingURLSessionTaskErrorKey=LocalWebSocketTask <70F87855-A1EA-409E-94DD-36862C08EC03>.<1>}))

Perhaps this range is incorrect? Sorry, I'm not so familiar with web socket error codes but this range won't catch my error above.

EDIT: Actually, digging into this more with breakpoints, this line is always false because underlyingError as? ErrorPayload always fails to cast.

polqf commented 2 years ago

Hi @davefoxy ! This looks interesting 🤔

underlyingError should be castable to ErrorPayload. If that does not happen, that's why isInvalidTokenError is false.

One thing that is important here is that you don't follow the trace starting from the error on the Websocket client, but instead start following from the last successful message you receive, which should have a format like this:

{\"error\":{\"code\":40,\"message\":\"JWTAuth error: token is expired (exp)\",\"StatusCode\":401,\"duration\":\"\",\"more_info\":\"\"}}"

Please let me know if you get that message, and try to follow the execution from there 🙏 . As I shared before, starting to follow the trace from the moment you receive an error is not what we want, as those errors don't provide information.

Perhaps this range is incorrect? Sorry, I'm not so familiar with web socket error codes but this range won't catch my error above.

The codes we have in that range are our own codes, sent by our backend in the last successful message (see above), not the ones Apple uses. And WebSocketEngineError is just a wrapper around Apple's error, for which we are not looking at the codes.

davefoxy commented 2 years ago

@polqf Please give me file and line numbers where you'd like me to breakpoint on and trace from. Just to make sure I'm properly aiming in the direction you need.

polqf commented 2 years ago

Here: https://github.com/GetStream/stream-chat-swift/blob/develop/Sources/StreamChat/WebSocketClient/Engine/URLSessionWebSocketEngine.swift#L79

Just as soon as you receive the payload I shared above, please check where the trace leads you.

bielikb commented 2 years ago

Hi @davefoxy,

we kicked new release 4.17.0 out the door. In 4.16.0 we provided new tokenProvider parameter that can be passed directly to connectUser API.

Could you grab the latest version of our SDK and see if leveraging the new tokenProvider API resolves your issue(s)?

Thanks!

Best, Boris

polqf commented 2 years ago

Hi @davefoxy, there's been some inactivity here. We are closing this issue for now, let us know if there's anything we can help you with

SSaleemSSI commented 2 years ago

hi - @polqf i am currently facing same issue in latest SDK, I have also checked the demo app it give same error and never reconnects. as mentione above. i have attached the log SS.

Screenshot 2022-09-16 at 3 00 04 PM
polqf commented 2 years ago

Hi @SSaleemSSI , the issue you are exposing seems different than the one outlined in this issue. Could you list the steps to reproduce it?

SSaleemSSI commented 2 years ago

Hi - @polqf, Thanks for the reply. I disconnected the network from Mobile and reconnect it. And banner for connected never triggered. error only appears in logs when i try sync. or App auto sync after sometime.

nuno-vieira commented 2 years ago

Hi @SSaleemSSI,

This issue usually means you forgot to call synchronize() in a controller. It is not related to this issue.

Best, Nuno

SSaleemSSI commented 2 years ago

Hi - @nuno-vieira How your demo app producing same result. Plus i also double check the sync call i am calling it. But have you tried to reproduce this issue, disconnect network and connect it again. Everything works fine sending messages as well but when App tried to sync then this socket error comes. I can show u the video if u want me to reproduce it with the demoApp. I also downloaded the latest SDK and demoApp code from main it also shows same error.

SSaleemSSI commented 2 years ago

Hi - @nuno-vieira @polqf Here is the recording with the error producing after demoApp calls the sync when i navigate around the chats. I took little time but when App tried to sync its message is clear that error connecting socket while trying to sync. I have also attached the logs after disconnecting and reconnecting the wifi. https://www.loom.com/share/f3feaa15827247ea839f4b168280586d Logs.zip

nuno-vieira commented 2 years ago

Hi @SSaleemSSI!

Thank you for the videos. We will investigate this Next Monday morning.

Best, Nuno

polqf commented 2 years ago

Hi @SSaleemSSI , after investigating this for I while, I can actually confirm that, when using the simulator, the reachability components are not working properly, and thus we are not always reconnecting properly. We are using Apple's NWPathMonitor, so it is not an issue on our side as far as we've investigated.

When using a physical device, these issues don't appear anymore for me. Could you please confirm that on your side?

PS. There are many posts like this: https://developer.apple.com/forums/thread/713330

SSaleemSSI commented 2 years ago

@polqf Thanks for the reply, Yes it works fine on the mobile device.

arunrajexperion commented 1 year ago

Hello, I am using the SPM for SwiftUI and developing a chat. I was getting the same issue as davefoxy. After chatting for a while in the app, the chat list freezes and user is not able to navigate inside a channel nor able to connect user. If I call the ChatClient.shared.connectUser(), there is no callback happening and hence unable to handle the error. I am using tokenProvider for refreshing the token. While debugging into the SDK found that on calling connectUser(), the flow reaches AuthenticationRepository -> private func scheduleTokenFetch() and returns completion call but the callback is never fired at my end. Could you please help me in this regard ?

Thanks Arun