ably / ably-cocoa

iOS, tvOS and macOS Objective-C and Swift client library SDK for Ably realtime messaging service
https://ably.com/download
Apache License 2.0
46 stars 25 forks source link

singleton realtime instance obtains multiple connections and confuses them #1882

Closed clickonetwo closed 8 months ago

clickonetwo commented 8 months ago

Which version of the Ably SDK are you using?

ably-cocoa 1.2.25

On which platform does the issue happen?

iOS 16.x, 17.x (on iPhone and iPad) and macOS 14.2.1 (using MacCatalyst)

Are you using Carthage?

No

Are you using Cocoapods?

No

Which version of Xcode are you using?

Xcode 15.2 Build version 15C500b

What did you do?

I obtain a single realtime client object and use it to open two channels, sending one packet on the first and two packets on the second. This works some of the time. However, on app startup, there is an intermittent failure in which the realtime client opens two connections to the Ably server rather than one, sends the packets on the first connection, and gets a mismatched connection id error response from the server that references the second connection.

NOTE: As a workaround, I have written my code so that it always sends a throwaway packet first thing after attaching a channel, and it looks for the mismatched connection ID error to come back from that first packet. If it gets that error, it totally tears down and closes the client, and then goes through the complete client creation/attach sequence again.

I’ve attached a debug log of a typical failure case. In this case the failure happened not once but twice, causing the client to tear down the connection a second time and then build it back up. On the third time, there is no second connection opened to the Ably server and so there is no connection mismatch error thrown and things proceed as expected. I can show you logs where there is never a second connection opened at startup, logs where a second connection is only made once on the first startup and not thereafter, and logs like this one where the second connection is made on two successive startups but not thereafter. Once I get a clean single-connection startup, I can tear down and rebuild connections as often as I want and the error never recurs. It only happens on the first (or first two) connection attempts that come after application startup.

My code is open source on GitHub. Most of the relevant code can be found in this class. The three most relevant code snippets are:

  1. The function that does the channel open/attach, and which notices the error on first packet, is tryOpenChannel.
  2. For context, the function that calls tryOpenChannel is openChannels and it calls it twice: once to open the "content" channel, which is named by a UUID, and once to open the "control" channel (named with "control"). Once the control channel is opened a non-throwaway packet is sent to announce this client's presence to others.
  3. Authentication is done in the TcpAuthenticator class, which obtains a signed AuthTokenRequest from my server after authenticating with a JWT signed by a private secret issued by my server via APNS.

What did you expect to happen?

The expected outcome is that, once authentication is complete, only one channel to the server would be opened, and all packets would be sent on that.

What happened instead?

What happens in the failure case is that two different connections are made by the realtime client to the server, messages are sent based on the first connection opened, then the second connection is opened, then the initial set of messages are resent, and at that point the server rejects them because it’s expecting them to be sent using the second connection ID. In the attached log, for example, the id of the first connection is _29fvQ8ykQ (line 192 of the log) and the id of the second connection is CcI3ilmBdc (line 261 of the log).

Additional

I was asked to open this bug by Mike Clark of the support team. Developer evangelist Cameron Michie is familiar with my application and has access to a running beta.

┆Issue is synchronized with this Jira Bug by Unito

maratal commented 8 months ago

Thanks @clickonetwo We're investigating this, in case of urgency please use v1.2.24 by updating your Podfile:

pod 'Ably', '1.2.24'

clickonetwo commented 8 months ago

Hi @maratal thanks for the suggestion. In fact I first observed this problem on 1.2.24 and only recently updated in hopes that 1.2.25 would have a fix. So I know this problem exists in 1.2.24 as well.

maratal commented 8 months ago

@clickonetwo thanks! This changes a lot (still investigating though).

maratal commented 8 months ago

@clickonetwo couldn't you try the branch for the fix? Thanks!

clickonetwo commented 8 months ago

@maratal Thanks for the quick work - I've switched my dependency to the fix/1882-fix-reachability-activation branch and I will start testing right away!

clickonetwo commented 8 months ago

Quick update after initial testing: seems to have fixed the problem! I've gone through about 15 app launches and initial connections and not seen the problem once. That would never have happened before - it was about 1 out of every 5 launches.

I am about to release a build with this library to my beta testers. That will get us close to 1000 launches in the next week or two. I'll report back on what I hear.

maratal commented 8 months ago

Nice work! @clickonetwo

clickonetwo commented 8 months ago

Hi @maratal just wanted to report that I've now had hundreds of sessions started against your branch library and there has been no trace of this problem. Thanks for much for your quick work on this fix! Do you have any estimate for when your PR will be accepted into the main line and released?

maratal commented 8 months ago

Thanks @clickonetwo I will make a release today.

maratal commented 8 months ago

This is now released @clickonetwo

clickonetwo commented 8 months ago

Hooray! Thanks so much for the fast fix!

On Fri, Mar 8, 2024 at 8:38 AM Marat Al @.***> wrote:

This is now released https://github.com/ably/ably-cocoa/releases/tag/1.2.26 @clickonetwo https://github.com/clickonetwo

— Reply to this email directly, view it on GitHub https://github.com/ably/ably-cocoa/issues/1882#issuecomment-1986025493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX7F4IKVQ3KPTBDQUCD6ZLTYXHSRHAVCNFSM6AAAAABEALXZGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBWGAZDKNBZGM . You are receiving this because you were mentioned.Message ID: @.***>