aspnet / SignalR

[Archived] Incredibly simple real-time web for ASP.NET Core. Project moved to https://github.com/aspnet/AspNetCore
Apache License 2.0
2.38k stars 447 forks source link

Application crashes on azure app service with signalr alpha #905

Closed yosigolan closed 7 years ago

yosigolan commented 7 years ago

setup: server: asp.net core app version 2.0 signalr 1.00 alpha client: Angular 4.3

symptom: when deploying the server to azure app service, server crashes with 2 connections (the entire app crashes and restarts). i couldn't receive any exception or error from azure app service, just: application stopped unexpectedly.

when using the same setup only with the old signalR (i know its not certified), the app does not crash. i think its related to the number of connections, because i see it happens when i am connecting the second client and playing a bit.

sorry i dont have too much details, i tried getting some logs from azure but haven't succeeded after few hours...

davidfowl commented 7 years ago

When you get some logs, let us know. Turn up the application logs to trace. This isn't very actionable feedback at the moment.

yosigolan commented 7 years ago

Do you mean the application logs inside the asp.net core app? they are currently on Information for Microsoft. the last logs i saw from the signalr was connection closed and then the application crashes and restarts (on azure). if there was any warning on error i guess i would see it (since its above information). i will try to reproduce it tomorrow locally, maybe than i will receive something.

Yosi

davidfowl commented 7 years ago

Any logs you can get, crank up the default logging level to "Trace" and look at either app insights (what I recommend) or use the stdout log

yosigolan commented 7 years ago

Hey @davidfowl , good news i think i got it. i reproduced it locally. it happens only on iis (on kestrel works fine). when connecting with 2 clients an exception is thrown inside signalr (i see it stops in visual studio):

{System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.AspNetCore.SignalR.DefaultHubLifetimeManager`1.<WriteAsync>d__11.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Travideo.AppsWebService.Hubs.TripPlanSystemViewConnector.<SetTripGeneralData>d__17.MoveNext() in C:\GitRepos\Backend\AppsWebService\AppsWebService\Hubs\TripPlanSystemViewConnector.cs:line 122}

once this exception is thrown the application is dead (this what caused it to crash on azure). in the logs i get nothing... even when loglevel for Microsoft and system is debug. let me know if you need anything else. Yosi

davidfowl commented 7 years ago

@yosigolan yes that exception looks pretty deadly 😄 . Can you put your application on a github repository somewhere so we can take a look.

yosigolan commented 7 years ago

sorry, its a production product, you can view it here: www.seevoov.com i cant share the code. but if you want i can show to you on a private session and even we can debug it.

Yosi

davidfowl commented 7 years ago

If you can provide any details on how the error can be reproduced, it would be great. Can you reproduce it on demand now only when running behind IIS locally?

yosigolan commented 7 years ago

yes, i can reproduce it on demand on my machine locally using iis express. if you want we can make a conference call and i will let you debug it (you need to connect with the source). the details i can provide are:

Yosi

davidfowl commented 7 years ago

I'd like to figure it out without a conference call (chat in slack might be easier to start with http://tattoocoder.com/asp-net-core-slack-community/) Based on the stack trace there's a null ref in at Microsoft.AspNetCore.SignalR.DefaultHubLifetimeManager1.d__11.MoveNext()which looks like it might map to theWriteAsync` call.

Some things that might help:

at Travideo.AppsWebService.Hubs.TripPlanSystemViewConnector.d__17.MoveNext() in C:\GitRepos\Backend\AppsWebService\AppsWebService\Hubs\TripPlanSystemViewConnector.cs:line 122}

What does this method do (at line 122)?

  1. What transports are being used?
  2. Did you turn logging on trace at the application level? Can you provide those logs?
yosigolan commented 7 years ago

in line 122 there is this code: await m_Client.InvokeAsync("setTripGeneralData",generalData); thats all the function do. m_Client is IClientProxy which was created as follows: m_Client = m_TripPlanHubContext.Clients.Client(m_ConnectionId);
it was created in the constructor using the client connection id. i checked, the connection id is valid (few calls before the crash works fine and arrives to the client)

i am using the defaults so eventually it uses websockes. i turned logging to trace (i am suing serilog so its debug) but i dont see any log from signalr.

davidfowl commented 7 years ago

it was created in the constructor using the client connection id.

Do not do that. Just get it in the method every time. The constructor is too early to store anything to do with the hub.

i turned logging to trace (i am suing serilog so its debug) but i dont see any log from signalr.

Then something is wrong. This is what your Program.cs should look like:

WebHost.CreateDefaultBuilder(args)
        .UseStartup<Startup>()
        .ConfigureLogging(logging =>
        {
             logging.SetMinimumLevel(LogLevel.Trace);
        }
        .Build();

That will set the logging to trace. There's an output window in visual studio that will show everything when running on IIS Express:

image

yosigolan commented 7 years ago

Thanks for the answer. | Do not do that. Just get it in the method every time. The constructor is too early to store anything to do with the hub. - are you saying that this is the cause? why does the constructor is too early? the connection was made earlier in the application lifetime. in the constructor i am injecting the context: IHubContext tripPlanHubContext and then just to make the methods nicer i am creating the client member from the context. you saying that this is forbidden? i should do this : var client = m_TripPlanHubContext.Clients.Client(m_ConnectionId); in every method that wants to send data to the client?

regarding the logs, when using serilog the configuration looks different. anyway, i do see Microsoft logging (in debug) but not signalr. from signalr i am seeing 1 line (repeatedly) that look like this: Using HubProtocol '"json"'. from Microsoft.AspNetCore.SignalR.HubEndPoint. in info

Yosi

davidfowl commented 7 years ago

| Do not do that. Just get it in the method every time. The constructor is too early to store anything to do with the hub. - are you saying that this is the cause? why does the constructor is too early? the connection was made earlier in the application lifetime. in the constructor i am injecting the context: IHubContext tripPlanHubContext

The constructor is too early because none of the hub properties have been set yet. You're injecting the IHubContext into the Hub itself? If yes why?

regarding the logs, when using serilog the configuration looks different. anyway, i do see Microsoft logging (in debug) but not signalr. from signalr i am seeing 1 line (repeatedly) that look like this: Using HubProtocol '"json"'. from Microsoft.AspNetCore.SignalR.HubEndPoint. in info

Just make sure you add the ConfigureLogging code that sets the level to Trace.

The problem is I can't decipher what your application is doing. I need to see the source code or you need to debug it more so I can get more details. It would help if you wrote the most basic idomatic hub code and see if that reproduces the issue:

After making those changes, see if the problem can be reproduced.

yosigolan commented 7 years ago

Let me explain a bit, i think the naming of my classes confused you:

Yosi

davidfowl commented 7 years ago

@yosigolan I see. Just do you know, I'm not trying to get you to change your code to fix things, I'm trying to narrow down the problem. There is a bug here, I'm just trying to figure out how I can reproduce the problem so I can fix it. I don't want to force you to debug the SignalR source code yourself so the goal is to narrow the things down so we can figure out what the issue is.

the exception occurred when calling a method on the sender that initiated the action await m_Client.InvokeAsync("setTripGeneralData",generalData);

We need to figure out what's causing the null ref. If you never store m_Client and always resolve the client before calling the method, does it work?

also, with the old signal r i am doing the same and it works great.

Yes, this is an alpha release and there are LOTS of bugs and changes that still need to be made. If you're running it in production that's on you 😄, there is no support for this until it ships. We appreciate the feedback regardless.

yosigolan commented 7 years ago

Hey. i am totally with you, just trying to help you find the root cause. i tried moving the line from the constructor next to the call, it dident help.

i have one more bit of information that might help: when the client calls the controller method (using the rest method) and asks the server to send him data back (using the signal r connection) the controller starts a background task that eventual uses the hub connection to send the data. so eventually, this task is running after the call to the controller method has ended. maybe it influence the context some how. i am sorry i cant share to code... Yosi

willthiswork89 commented 7 years ago

Why dont you take the time and create a simple app that replicates your problem without your proprietary IP in it? You are on here asking for assistance but unwilling to share code and giving generic exceptions.... Take the 15 or 20 minutes and setup a simple app to replicate the problem so that the devs here can actually help.

yosigolan commented 7 years ago

its holidays here. will try shortly.

Yosi

davidfowl commented 7 years ago

@yosigolan any updates?

yosigolan commented 7 years ago

Hey @davidfowl i will only be able to so it on Sunday. Sunday morning i will create full reconstruction and upload it.

Yosi

yosigolan commented 7 years ago

Hey. so, i created a full clone of my app without the business logic which reproduces the issue. you can find it here: https://github.com/yosigolan/signalRNullException

before you jump into the code, please read this: when i created the projects i actually found the problem. the null exception is throw (in my case) when the hub is trying to invoke a client method for a client which was already disconnected (or haven't connected at all before, meaning the connection id is wrong). the "crash" that i described is related to the fact that i dident handled the exception (from siganlR) correctly and it cause my app to crash. so the crash is not related to signalR.

regarding the exception, i do think that an exception should be thrown but maybe more related to the problem and also if possible give an option to check if the client (with some connectionId) is connected before invoking a method on it.

regarding the code that i uploaded, there are 2 folders, server and client. i used angular client but as you can understand you can use any client to reproduce it, event postman that sends wrong connection id to the post method. in order to reproduce it with my client, simple start it and once it is connected close it. the server will try invoking a method on the client after 10 seconds and the exception will be thrown since the client is disconected.

hope it helps, Yosi

davidfowl commented 7 years ago

Thanks!

davidfowl commented 7 years ago

This issue should be fixed now and will appear in alpha2