Closed umar-ulabs closed 1 year ago
How about the app server side CPU usage and Memory Usage?
The CPU has been below 18% and Memory Usage has been stable at 450 MB.
Server connections are expected to be automatically recovered.
Does the old version contain the same issue or does this issue happen only with the latest version? What are the logs between "Service timeout" and "Microsoft.Azure.SignalR.Common.ServiceConnectionNotActiveException: "? There should be messages when the server connection is trying to reconnect.
Would you mind sharing with me the dump file when the app hangs? email: lianwei(at)microsoft.com
Sure, will send you dump files, thanks
Thanks for the support! Also could you share with me more details about:
Sent dump files over the email. Thank you
While debugging CPU dump file in VS. There are some "Blocked" threads, not sure what it means.
I have noticed, when our app restarts and azure signalr connection is dropped, until azure signalr is reconnected, the application is in hang state. Can the follow cause the app hang in the startup or during the normal request/response times if connection is not established with azure service?
app.UseAzureSignalR(routes =>
{
routes.MapHub
});
After Application has started => "Timestamp":"2021-05-31T22:31:43.7944090+00:00","Level":"Information","MessageTemplate":"### Application has started."} {"Timestamp":"2021-05-31T22:32:55.6969746+00:00","Level":"Error","MessageTemplate":"Error receiving handshake response. Id: {ServiceConnectionId}","Exception":"System.InvalidOperationException: Service disconnected before sending a handshake response.\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.ReceiveHandshakeResponseAsync(PipeReader input, CancellationToken token)\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.HandshakeAsync(ConnectionContext context)","Properties":{"ServiceConnectionId":"78b01fc3-c7d8-4674-b3d1-11720b4b2556","EventId":{"Id":23,"Name":"ErrorReceivingHandshakeResponse"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-05-31T22:32:55.7604218+00:00","Level":"Fatal","MessageTemplate":"Service returned handshake error: {Error}. Id: {ServiceConnectionId}","Properties":{"Error":"Service disconnected before sending a handshake response.","ServiceConnectionId":"78b01fc3-c7d8-4674-b3d1-11720b4b2556","EventId":{"Id":24,"Name":"HandshakeError"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-05-31T22:32:55.8125376+00:00","Level":"Error","MessageTemplate":"Hub '{hub}' is now disconnected from '{endpoint}'. Please check log for detailed info.","Properties":{"hub":"AgentHub","endpoint":"
@umar-ulabs By the application is in hang state
, do you mean the application is not able to handle any other web requests, or do you mean the application is not able to handle SignalR requests? Could you describe more about the "hang" symptom?
That's correct. The applications starts and does alot of initialization it needs to do before hostbuilder.run(starting the app) is called. Then it seems until the azure service is not connected back to the app, the app doesn't handle any requests. It just hangs all http requests. I am wondering if azure signalr setup needs to be in a specific spot in the startup.cs to avoid hang (it would connect when azure signalr available) and http requests can be served
I just tested a scenario. If the application is down for maintenance or restarted, if the shut down and start happens within a minute, then the application does not hang and there is no connection drop between app and azure signalr and all works fine quickly and no error logs(as above). If the restart takes lets say 2 or 5 minutes and when app starts, the azure signalr service has timeout i guess during this time, hence it hangs until everything becomes ok between app and azure signalr service. Is there a way to increase this time out or something to prevent this or even if connection is dropped, that should not effect rest of the app? Thanks
@umar-ulabs I don't think Azure SignalR connections would block the rest of your app server. All the server connections are established in the background, SignalR-negotiate requests would fail when none of the server connections is established, but no requests would get blocked.
Could you share a little bit more of your repro steps?
Following is the Startup.cs
//ConfigureServices in Startup.cs
services.AddSingleton(Configuration);
services.AddHttpClient();
services.AddAutoMapper(typeof(Startup));
services.AddJwtConfiguration();
services.AddControllersWithViews(options =>
{
options.Filters.Add
}).AddAzureSignalR(x =>
{
x.ApplicationName = $"NameOfApp";
x.ConnectionString = connectionString;
});
// Configure in Startup.cs
app.UseHsts();
app.UseHttpsRedirection();
app.UseStaticFiles(new StaticFileOptions
{
ServeUnknownFileTypes = true,
});
app.UseRouting();
app.UseResponseCompression();
app.UseAuthentication();
app.UseAuthorization();
app.UseEndpoints(endpoints =>
{
endpoints.MapControllerRoute(
name: "default",
pattern: "{controller=Home}/{action=Index}/{id?}");
});
app.UseAzureSignalR(routes =>
{
routes.MapHub<AgentHub>("/AgentHub");
routes.MapHub<UserHub>("/UserHub");
});
app.UseSpaStaticFiles();
app.UseCustomSwagger();
app.UseSpa(spa =>
{
spa.Options.SourcePath = @"ClientApp";
});
The applications starts and does alot of initialization it needs to do before hostbuilder.run(starting the app) is called. Then it seems until the azure service is not connected back to the app, the app doesn't handle any requests. It just hangs all http requests.
No the azure signalr SDK should not hang any requests. To confirm that:
UseAzureSignalR
part of the app server to see if requests are still hung when app server restarts. I tried by removing signalr part from the app, and there was no delay. It was super fast to respond and stabilized quickly. Then I tried putting back the signalr and at the following exceptions the app again in hang state. As soon as the azure signalr is back "back online" and clients start connecting, all is fine. The CPU goes crazy too upto 100% during the errors.
{"Timestamp":"2021-06-02T04:25:12.7383979+00:00","Level":"Information","MessageTemplate":"Application has started.","Properties":{"SourceContext":"--"}} {"Timestamp":"2021-06-02T04:25:53.1058357+00:00","Level":"Error","MessageTemplate":"Error receiving handshake response. Id: {ServiceConnectionId}","Exception":"System.InvalidOperationException: Service disconnected before sending a handshake response.\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.ReceiveHandshakeResponseAsync(PipeReader input, CancellationToken token)\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.HandshakeAsync(ConnectionContext context)","Properties":{"ServiceConnectionId":"632378af-ebde-4e60-abb5-7de1f4fb3bee","EventId":{"Id":23,"Name":"ErrorReceivingHandshakeResponse"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-06-02T04:25:53.1280478+00:00","Level":"Fatal","MessageTemplate":"Service returned handshake error: {Error}. Id: {ServiceConnectionId}","Properties":{"Error":"Service disconnected before sending a handshake response.","ServiceConnectionId":"632378af-ebde-4e60-abb5-7de1f4fb3bee","EventId":{"Id":24,"Name":"HandshakeError"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}} {"Timestamp":"2021-06-02T04:25:53.1313395+00:00","Level":"Error","MessageTemplate":"Hub '{hub}' is now disconnected from '{endpoint}'. Please check log for detailed info.","Properties":{"hub":"UserHub","endpoint":"(Primary)https://---.service.signalr.net","EventId":{"Id":2,"Name":"EndpointOffline"},"SourceContext":"Microsoft.Azure.SignalR.StrongServiceConnectionContainer"}} {"Timestamp":"2021-06-02T04:25:53.2265755+00:00","Level":"Error","MessageTemplate":"Error receiving handshake response. Id: {ServiceConnectionId}","Exception":"System.InvalidOperationException: Service disconnected before sending a handshake response.\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.ReceiveHandshakeResponseAsync(PipeReader input, CancellationToken token)\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.HandshakeAsync(ConnectionContext context)","Properties":{"ServiceConnectionId":"3195e6f7-d35d-494d-a783-ca58aa0941a2","EventId":{"Id":23,"Name":"ErrorReceivingHandshakeResponse"},"SourceContext":"Microsoft.Azure.SignalR.ServiceConnection"}}
The Azure Signalr service side- Availability And Performance
How many server connections for one app server? I see ~300 connections in the metrics.
Could you share the dump when the hung happens?
Please feel free to reopen the issue if there are further updates.
Once every day, we will get the following errors and then application hangs and there is no way to recover except to restart the app service:
Packages
Errors
Service timeout. {ServiceTimeout:0.00}ms elapsed without receiving a message from service.
while sending message to the service, the connection carrying the traffic is dropped. Error detail: {message}","Exception":"Microsoft.Azure.SignalR.Common.ServiceConnectionNotActiveException: Connection ping timeout.\r\n at Microsoft.Azure.SignalR.ServiceConnectionBase.WriteAsync(ServiceMessage serviceMessage).
It is hard to figure out why would the entire application go into "Hang state" when signalr service isn't working. Your insights will be much appreciated.