dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.4k stars 10k forks source link

Java Client NPE crash when calling stop before connected #28608

Closed dsuresh-ap closed 3 years ago

dsuresh-ap commented 3 years ago

Describe the bug

If you call close() on a connection that did not establish yet (in the CONNECTING state) a NPE will be thrown due to the code calling transport.stop() on a null reference.

See HubConnection.stop(String errorMessage)

Completable stop = transport.stop();

To Reproduce

  1. Call .start() on a connection.
  2. Before the connection changes to the connected state (or successfully has a transport) call .close() on it. This happened to us because our server returns us 404 when attempting to connect with long polling so we .close() on it to stop it before switching to SSE.

Exceptions (if any)co

2020-12-11 16:48:49.094 15652-15652/x E/AndroidRuntime: FATAL EXCEPTION: main
    Process:x, PID: 15652
    java.lang.NullPointerException: Attempt to invoke interface method 'io.reactivex.Completable com.microsoft.signalr.Transport.stop()' on a null object reference
        at com.microsoft.signalr.HubConnection.stop(HubConnection.java:424)
        at com.microsoft.signalr.HubConnection.stop(HubConnection.java:513)
        at x.websockets.connection.core.SignalRCoreWebsocketConnection.disconnect(SignalRCoreWebsocketConnection.kt:185)
        at x.websockets.connection.core.OrchestratorCoreWebsocketConnection.disconnect(OrchestratorCoreWebsocketConnection.kt:65)
        at x.websocket.WebsocketManager.disconnectAllConnections(WebsocketManager.kt:116)
        at x.app.DebugMenuCreatorKt$createDebugMenuListener$1$onForceWebsocketTransportMenuRequested$1.onClick(DebugMenuCreator.kt:95)
        at androidx.appcompat.app.AlertController$AlertParams$3.onItemClick(AlertController.java:1068)
        at android.widget.AdapterView.performItemClick(AdapterView.java:374)
        at android.widget.AbsListView.performItemClick(AbsListView.java:1736)
        at android.widget.AbsListView$PerformClick.run(AbsListView.java:4207)
        at android.widget.AbsListView$7.run(AbsListView.java:6692)
        at android.os.Handler.handleCallback(Handler.java:883)
        at android.os.Handler.dispatchMessage(Handler.java:100)
        at android.os.Looper.loop(Looper.java:237)
        at android.app.ActivityThread.main(ActivityThread.java:8167)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:496)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1100)

Further technical details

dsuresh-ap commented 3 years ago

Is there another way to dispose of a connection that is in the CONNECTING state?

BrennanConroy commented 3 years ago
2\. This happened to us because our server returns us 404 when attempting to connect with long polling so we `.close()` on it to stop it before switching to SSE.

The Java client doesn't have transport fallback currently, so a 404 should already close the connection.

at x.websockets.connection.core.SignalRCoreWebsocketConnection.disconnect(SignalRCoreWebsocketConnection.kt:185) at x.websockets.connection.core.OrchestratorCoreWebsocketConnection.disconnect(OrchestratorCoreWebsocketConnection.kt:65) at x.websocket.WebsocketManager.disconnectAllConnections(WebsocketManager.kt:116)

What is going on here? Did you replace the websocket connection? (And for my own curiosity how?) If you are replacing the transport, you should make start fail if the connections start results in a 404, and you shouldn't call .close unless the connection has opened.

https://github.com/dotnet/aspnetcore/blob/4a7ed303fff26e4070e02e385fb22df484f6b81f/src/SignalR/clients/java/signalr/core/src/main/java/com/microsoft/signalr/OkHttpWebSocketWrapper.java#L146

dsuresh-ap commented 3 years ago

@BrennanConroy We have manual logic to switch from Websockets -> SignalR Core long polling or SSE (different lib). If Websockets or long polling fails to connect or in our in-app debug transport switcher gets invoked, we call .stop() on the current connection and null it out before attempting a new connection.

The NPE crash happened when we we called .stop() on a connection that was in the CONNECTING state.

So to be clear, we should: Only call .close()/.stop() on a connection that is in the CONNECTED state. Not on one that is in the DISCONNECTED/CONNECTING state?

Please find our logic for stopping a connection and clearing out resources below.

override fun disconnect() {
        shouldReconnect = false
        connection?.let { connection ->
            dataHandler.deregisterHubMethods(connection)
            stopConnection(connection)
        }
        connectionDisposables.clear() // A composite disposable that contains all the disposables from any `.start()` or `.invoke()` subscriptions.
        connection = null
    }

    private fun stopConnection(connection: HubConnection) =
        try {
            connection.stop().subscribeOn(Schedulers.io()).subscribe({
                LogUtils.v(TAG, "Connection stopped")
            }, {
                LogUtils.w(TAG, "Connection stopped with error: ${it.localizedMessage}")
             })
        } catch (e: NullPointerException) {
            // Due to an issue in the library we need to catch NPEs if we call connection.stop()
            // when the connection is in the CONNECTING state
            LogUtils.w(TAG, "Expected NPE when stopping connection: ${e.localizedMessage}")
        }
BrennanConroy commented 3 years ago

Only call .close()/.stop() on a connection that is in the CONNECTED state. Not on one that is in the DISCONNECTED/CONNECTING state?

Well, there is definitely a bug where we throw NPE, but normally if there was no bug this should be fine.

It's still unclear how/why you're calling stop. If the transport fails to connect, then the client will clean up and close the connection on its own.

dsuresh-ap commented 3 years ago

Understood, thanks for the acknowledgement.

There are cases when we need to call stop() on the connection. Ex:

  1. User logged out and we need to close the connection before bringing them to login
  2. The connection needs to be closed before manually switching to another transport (long polling for SignalR Core or SSE using another library)
  3. Feature flags that would disable SignalR Core for some users
  4. Cleanup for sanity in the case onClosed callback got triggered. From what you are saying we do not need to call stop() in this case as if onClosed() was called the connection already closed everything down internally?
BrennanConroy commented 3 years ago
1. User logged out and we need to close the connection before bringing them to login

That should be fine 99% of the time as the connection will be in the connected state.

2\. The connection needs to be closed before manually switching to another transport (long polling for SignalR Core or SSE using another library)

Still don't understand this. If WebSockets fails, the connection will close automatically.

3\. Feature flags that would disable SignalR Core for some users

Seems fine, if SignalR is disabled, you shouldn't have a connection to call stop on.

4\. Cleanup for sanity in the case `onClosed` callback got triggered. From what you are saying we do not need to call `stop()` in this case as if `onClosed()` was called the connection already closed everything down internally?

Correct.

dsuresh-ap commented 3 years ago

For 1, 3 there is the slight chance that we are not connected or in a connecting state (poor internet, timing, etc). I get that this is a low chance but when we were running the 3.x version of this library our app got hit hard with a lot of crashes due to the server running SignalR having connection issues. This caused crashes in the app due to some other bugs that were fixed in 5.0. However to reproduce these crashes we use Charles to simulate weird connection issues (ex. denied requests) and was able to reproduce a crash like this. There was another crash in 5.0 that was fixed and when attempting to verify if that crash was fixed I was able to reproduce this crash.

2 happens internally when our QA tests each transport. They can manually switch between different transports which attempts to stop the connection if one exists (might be connecting) before starting a new one.

ghost commented 3 years ago

Thanks for contacting us. We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

SergeyKharuk commented 3 years ago

Hello. Just to inform me: Will this be fixed in a future version of the signarR library (5.0.4 for example) ?

4brunu commented 3 years ago

Hi, I had the same issue today.

My use case is, I create a service associated with a heads-up notification.

Depending if the app is closed/open/background and if the device is locked/unlocked, the Android OS may open or not the activity associated with the heads-up notification.

If that activity opens, it will kill the service since it's not longer required.

In this use case, the service will be short live, and the SignalR connection may be stoped, before it's successfully connected to the server, and it will crash the app.

Do you have any plans o fixing this? Can I help to get this fixed? Thanks

BrennanConroy commented 3 years ago

Thanks for the reminder, opened a PR to fix this https://github.com/dotnet/aspnetcore/pull/34046