aspnet / SignalR

[Archived] Incredibly simple real-time web for ASP.NET Core. Project moved to https://github.com/aspnet/AspNetCore
Apache License 2.0
2.38k stars 446 forks source link

404 - No Connection with that ID #3293

Closed jceddy closed 6 years ago

jceddy commented 6 years ago

On Friday I posted an issue, and closed it by the end of the day because the main problem (connection issue causing what amounted to a denial of service attack) was due to the way we had implemented our code to re-establish a failed connection: #3289

After examining this issue further, reproducing it and capturing client-side traffic logs, I have some lingering confusion about something, so am opening a new issue to address it.

On the server in question, we are occasionally seeing the following behavior, and I'm wondering if anyone has a suggestion as to what might be causing it, or maybe to help direct further investigation on my part:

We start a new connection, and see a call to "baseHub/negotiate" in the traffic log. This call succeeds, and returns a connection ID (let's say ABCD123). We then make a call using that connection ID, that call hits "baseHub?id=ABCD123", and that call fails: the server returns a 404 status along with the text "No Connection with that ID."

I'm wondering in what case the SignalR server code would be returning apparently unusable connection IDs?

Some more detail: The call is being made in a "then" block tagged onto the connection "start" call, like this:

signalRHub.start().then(function() { // when start runs, negotiate is called and returns a new connection ID
  signalRHub.invoke("methodName", params); // the call generated by this returns 404 with "No Connection with that ID"
}

Is there an issue with the particular pattern, or what?

Note: Another thing I've noticed that I don't remember seeing before is that the long-poll calls are showing up with a (canceled) status in Chrome dev tools. Is this correct, or maybe the result of a Chrome update, possibly?

image

analogrelay commented 6 years ago

We then make a call using that connection ID, that call hits "baseHub?id=ABCD123", and that call fails: the server returns a 404 status along with the text "No Connection with that ID."

This kind of 404 behavior means one of two things:

  1. The delay between /baseHub/negotiate and /baseHub?id=ABCD123 is so long (> 5 seconds) that the connection times out in between.
  2. You have multiple server instances/processes/IIS worker processes/etc. without sticky sessions enabled and the /baseHub/negotiate call is going to one instance, and then the /baseHub?id=ABCD123 call is going to a different instance, which doesn't have the connection in it's dictionary.

You mentioned in the other issue that you don't have multiple servers, which seems very odd. Is it possible you have multiple instance of the application running on a single server or are using staging slots to swap between multiple instances?

That error message (No Connection with that ID.) covers a very small set of cases. The URL is successfully routing to SignalR, but the connection ID you provided was not one returned by /negotiate on this server.

jceddy commented 6 years ago

Another thing I noticed is that we are seeing a pattern like this:

/baseHub/negotiate (200) /baseHub/negotiate (200) /baseHub?id=ABCD123 (404) /baseHub?id=WXYZ789 (404)

I'm wondering if what is happening is that the first negotiate returns ABCD123, then the second negotiate returns WXYZ789, and by the time the call using ABCD123 actually reaches the server, that connection ID has become invalid.

It's kind of hard to see what is actually happening, since all of these calls are within the same second.

It may be that the change I made to fix the the issue from the other problem (where the client basically launches a DoS against the server), will actually cause these 404s to disappear if this is somehow the result of a timing issue/race condition.

Here's a question: What happens if you call start() on a connection that start() was already called on? Does the second call just fail, or does the connection get closed and re-started?

BrennanConroy commented 6 years ago

The second call will fail with a message like "Cannot start a connection that is not in the 'Disconnected' state."

It looks like you're starting two connections, not sure why you're getting 404s if everything happens in the same second, unless another server is involved.

Is it possible for you to gather logs/traces detailed in https://github.com/aspnet/SignalR/wiki/Diagnostics-Guide ?

analogrelay commented 6 years ago

Closing this as we haven't heard from you. Please feel free to comment if you're able to get the information we're looking for and we can reopen the issue if we're able to identify the problem.