Azure / azure-signalr

Azure SignalR Service SDK for .NET
https://aka.ms/signalr-service
MIT License
427 stars 101 forks source link

Service timeout. {ServiceTimeout:0.00}ms elapsed without receiving a message from service #1279

Closed umar-ulabs closed 1 year ago

umar-ulabs commented 3 years ago

Once every day, we will get the following errors and then application hangs and there is no way to recover except to restart the app service:

Packages

Errors

Capture1

It is hard to figure out why would the entire application go into "Hang state" when signalr service isn't working. Your insights will be much appreciated.

vicancy commented 3 years ago

How about the app server side CPU usage and Memory Usage?

umar-ulabs commented 3 years ago

The CPU has been below 18% and Memory Usage has been stable at 450 MB.

vicancy commented 3 years ago

Server connections are expected to be automatically recovered.

Does the old version contain the same issue or does this issue happen only with the latest version? What are the logs between "Service timeout" and "Microsoft.Azure.SignalR.Common.ServiceConnectionNotActiveException: "? There should be messages when the server connection is trying to reconnect.

Would you mind sharing with me the dump file when the app hangs? email: lianwei(at)microsoft.com

umar-ulabs commented 3 years ago

Sure, will send you dump files, thanks

vicancy commented 3 years ago

Thanks for the support! Also could you share with me more details about:

  1. Does the old version contain the same issue or does this issue happen only with the latest version?
  2. What are the logs between "Service timeout" and "Microsoft.Azure.SignalR.Common.ServiceConnectionNotActiveException: "?
umar-ulabs commented 3 years ago
  1. There was not any older version used. It is a new application.
  2. Application task failed.","Exception":"Microsoft.Azure.SignalR.Common.AzureSignalRException: Cancelled running application task, probably caused by time out.\r\n at Microsoft.Azure.SignalR.ServiceConnection.ProcessIncomingMessageAsync(ClientConnectionContext connection)
umar-ulabs commented 3 years ago

Sent dump files over the email. Thank you

umar-ulabs commented 3 years ago

While debugging CPU dump file in VS. There are some "Blocked" threads, not sure what it means. Capture

umar-ulabs commented 3 years ago

I have noticed, when our app restarts and azure signalr connection is dropped, until azure signalr is reconnected, the application is in hang state. Can the follow cause the app hang in the startup or during the normal request/response times if connection is not established with azure service?

app.UseAzureSignalR(routes => { routes.MapHub("/AgentHub"); routes.MapHub("/UserHub");

        });
umar-ulabs commented 3 years ago

After Application has started => "Timestamp":"2021-05-31T22:31:43.7944090+00:00","Level":"Information","MessageTemplate":"### Application has started."} {"Timestamp":"2021-05-31T22:32:55.6969746+00:00","Level":"Error","MessageTemplate":"Error receiving handshake response. Id: {ServiceConnectionId}","Exception":"System.InvalidOperationException: Service disconnected before sending a handshake response.\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.ReceiveHandshakeResponseAsync(PipeReader input, CancellationToken token)\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.HandshakeAsync(ConnectionContext context)","Properties":{"ServiceConnectionId":"78b01fc3-c7d8-4674-b3d1-11720b4b2556","EventId":{"Id":23,"Name":"ErrorReceivingHandshakeResponse"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-05-31T22:32:55.7604218+00:00","Level":"Fatal","MessageTemplate":"Service returned handshake error: {Error}. Id: {ServiceConnectionId}","Properties":{"Error":"Service disconnected before sending a handshake response.","ServiceConnectionId":"78b01fc3-c7d8-4674-b3d1-11720b4b2556","EventId":{"Id":24,"Name":"HandshakeError"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-05-31T22:32:55.8125376+00:00","Level":"Error","MessageTemplate":"Hub '{hub}' is now disconnected from '{endpoint}'. Please check log for detailed info.","Properties":{"hub":"AgentHub","endpoint":"

vicancy commented 3 years ago

@umar-ulabs By the application is in hang state, do you mean the application is not able to handle any other web requests, or do you mean the application is not able to handle SignalR requests? Could you describe more about the "hang" symptom?

umar-ulabs commented 3 years ago

That's correct. The applications starts and does alot of initialization it needs to do before hostbuilder.run(starting the app) is called. Then it seems until the azure service is not connected back to the app, the app doesn't handle any requests. It just hangs all http requests. I am wondering if azure signalr setup needs to be in a specific spot in the startup.cs to avoid hang (it would connect when azure signalr available) and http requests can be served

umar-ulabs commented 3 years ago

I just tested a scenario. If the application is down for maintenance or restarted, if the shut down and start happens within a minute, then the application does not hang and there is no connection drop between app and azure signalr and all works fine quickly and no error logs(as above). If the restart takes lets say 2 or 5 minutes and when app starts, the azure signalr service has timeout i guess during this time, hence it hangs until everything becomes ok between app and azure signalr service. Is there a way to increase this time out or something to prevent this or even if connection is dropped, that should not effect rest of the app? Thanks

vicancy commented 3 years ago

@umar-ulabs I don't think Azure SignalR connections would block the rest of your app server. All the server connections are established in the background, SignalR-negotiate requests would fail when none of the server connections is established, but no requests would get blocked.

Could you share a little bit more of your repro steps?

umar-ulabs commented 3 years ago

Following is the Startup.cs //ConfigureServices in Startup.cs services.AddSingleton(Configuration); services.AddHttpClient(); services.AddAutoMapper(typeof(Startup)); services.AddJwtConfiguration(); services.AddControllersWithViews(options => { options.Filters.Add(); options.Filters.Add(new ValidationFilter()); options.Conventions.Add(new ControllerDocumentationConvention()); }) .AddFluentValidation(options => { options.RegisterValidatorsFromAssemblyContaining(); }) .AddControllersAsServices() .AddNewtonsoftJson(o => { o.SerializerSettings.Formatting = Formatting.Indented; o.SerializerSettings.ReferenceLoopHandling = ReferenceLoopHandling.Ignore; }); services.AddSignalR(o => { o.MaximumReceiveMessageSize = null;

                }).AddAzureSignalR(x =>
                {
                    x.ApplicationName = $"NameOfApp";
                    x.ConnectionString = connectionString;                     
                });

        // Configure  in Startup.cs
                app.UseHsts();
                app.UseHttpsRedirection();   
                 app.UseStaticFiles(new StaticFileOptions
                {
                    ServeUnknownFileTypes = true,
                });
                app.UseRouting();
                app.UseResponseCompression();
                app.UseAuthentication();
                app.UseAuthorization();
                app.UseEndpoints(endpoints =>
                {
                    endpoints.MapControllerRoute(
                        name: "default",
                        pattern: "{controller=Home}/{action=Index}/{id?}");
                });
                app.UseAzureSignalR(routes =>
                {
                    routes.MapHub<AgentHub>("/AgentHub");
                    routes.MapHub<UserHub>("/UserHub");

                });
                app.UseSpaStaticFiles();
                app.UseCustomSwagger();
                app.UseSpa(spa =>
                {
                    spa.Options.SourcePath = @"ClientApp";
                }); 
vicancy commented 3 years ago

The applications starts and does alot of initialization it needs to do before hostbuilder.run(starting the app) is called. Then it seems until the azure service is not connected back to the app, the app doesn't handle any requests. It just hangs all http requests.

No the azure signalr SDK should not hang any requests. To confirm that:

umar-ulabs commented 3 years ago

I tried by removing signalr part from the app, and there was no delay. It was super fast to respond and stabilized quickly. Then I tried putting back the signalr and at the following exceptions the app again in hang state. As soon as the azure signalr is back "back online" and clients start connecting, all is fine. The CPU goes crazy too upto 100% during the errors.

{"Timestamp":"2021-06-02T04:25:12.7383979+00:00","Level":"Information","MessageTemplate":"Application has started.","Properties":{"SourceContext":"--"}} {"Timestamp":"2021-06-02T04:25:53.1058357+00:00","Level":"Error","MessageTemplate":"Error receiving handshake response. Id: {ServiceConnectionId}","Exception":"System.InvalidOperationException: Service disconnected before sending a handshake response.\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.ReceiveHandshakeResponseAsync(PipeReader input, CancellationToken token)\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.HandshakeAsync(ConnectionContext context)","Properties":{"ServiceConnectionId":"632378af-ebde-4e60-abb5-7de1f4fb3bee","EventId":{"Id":23,"Name":"ErrorReceivingHandshakeResponse"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-06-02T04:25:53.1280478+00:00","Level":"Fatal","MessageTemplate":"Service returned handshake error: {Error}. Id: {ServiceConnectionId}","Properties":{"Error":"Service disconnected before sending a handshake response.","ServiceConnectionId":"632378af-ebde-4e60-abb5-7de1f4fb3bee","EventId":{"Id":24,"Name":"HandshakeError"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-06-02T04:25:53.1313395+00:00","Level":"Error","MessageTemplate":"Hub '{hub}' is now disconnected from '{endpoint}'. Please check log for detailed info.","Properties":{"hub":"UserHub","endpoint":"(Primary)https://---.service.signalr.net","EventId":{"Id":2,"Name":"EndpointOffline"},"SourceContext":"Microsoft.Azure.SignalR.StrongServiceConnectionContainer"}} {"Timestamp":"2021-06-02T04:25:53.2265755+00:00","Level":"Error","MessageTemplate":"Error receiving handshake response. Id: {ServiceConnectionId}","Exception":"System.InvalidOperationException: Service disconnected before sending a handshake response.\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.ReceiveHandshakeResponseAsync(PipeReader input, CancellationToken token)\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.HandshakeAsync(ConnectionContext context)","Properties":{"ServiceConnectionId":"3195e6f7-d35d-494d-a783-ca58aa0941a2","EventId":{"Id":23,"Name":"ErrorReceivingHandshakeResponse"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}}

umar-ulabs commented 3 years ago

The Azure Signalr service side- Availability And Performance gapture

vicancy commented 3 years ago

How many server connections for one app server? I see ~300 connections in the metrics.

vicancy commented 3 years ago

Could you share the dump when the hung happens?

vicancy commented 1 year ago

Please feel free to reopen the issue if there are further updates.