SparkDevNetwork / Rock

An open source CMS, Relationship Management System (RMS) and Church Management System (ChMS) all rolled into one.
http://www.rockrms.com
580 stars 353 forks source link

v12.0 - Exception: The remote host closed the connection. The error code is 0x80070057. #4652

Closed leahjennings closed 2 years ago

leahjennings commented 3 years ago

Prerequisites

Description

Submitting this on behalf of a few people who have chimed in on the chat about this issue occurring after upgrading to v12: https://chat.rockrms.com/channel/troubleshooting?msg=d3uvzFtkPByvcbMpp.

Here is the stack trace from one of the emails I have gotten about it:


Message
The remote host closed the connection. The error code is 0x80070057.

Stack Trace
at System.Web.Hosting.IIS7WorkerRequest.RaiseCommunicationError(Int32 result, Boolean throwOnDisconnect)
at System.Web.Hosting.IIS7WorkerRequest.ExplicitFlush()
at System.Web.HttpResponse.Flush(Boolean finalFlush, Boolean async)
at System.Web.HttpWriter.WriteFromStream(Byte[] data, Int32 offset, Int32 size)
at Microsoft.Owin.Host.SystemWeb.CallStreams.OutputStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at Microsoft.AspNet.SignalR.Owin.ServerResponse.Write(ArraySegment 1 data)
at Microsoft.AspNet.SignalR.Hosting.ResponseExtensions.End(IResponse response, String data)
at Microsoft.AspNet.SignalR.PersistentConnection.ProcessNegotiationRequest(HostContext context)
at Microsoft.AspNet.SignalR.PersistentConnection.ProcessRequest(HostContext context)
at Microsoft.Owin.Mapping.MapMiddleware.<Invoke>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Owin.Host.SystemWeb.IntegratedPipeline.IntegratedPipelineContextStage.<RunApp>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Owin.Host.SystemWeb.IntegratedPipeline.IntegratedPipelineContext.<DoFinalWork>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Microsoft.Owin.Host.SystemWeb.IntegratedPipeline.StageAsyncResult.End(IAsyncResult ar)
at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

What is interesting is this exception doesn't actually show up in the Exception List page, I only know it happens by the exception emails that I get. Since we upgraded to v12.0, it's occurred 314 times for us. Some days I don't get any emails, some days I get 19 emails. And the email only includes the stack trace and a URL, but the URL doesn't indicate what page they were on when it occurred:

Screen Shot 2021-04-07 at 1 55 33 PM

What's even weirder, is I've gotten that exception email for my user account a few times, and each time I didn't notice anything even happened from the user experience side of things.

Here's some information about our environment:

Steps to Reproduce

Unable to reproduce due to the nature of the bug.

Expected behavior:

Exception would not occur.

Actual behavior:

Exception occurs.

Versions

camrun91 commented 3 years ago

We are having this issue as well on 12.2 we are hosting in AWS. For us it seems to happen most on the communication pages.

cabal95 commented 3 years ago

SignalR (mentioned in the Stack Trace) handles on-the-fly communication between the client and server. i.e. the page is loaded, but the server can still send additional data to the client. There are only a few places this is used in Rock (like maybe 3?) but one place is on the Communication Entry Wizard block. When sending to a large list of people it uses this component to show a progress bar to the user to let them know "something is happening" instead of just staring at a spinning circle for a few minutes.

As @camrun91 mentioned the Communication Entry Wizard block is one of the few places SignalR is used. Maybe users are closing the browser before the progress notifications have finished which is causing "the remote host closed the connection".

@leahjennings You might pick a few times those exceptions show up and check if any communications were created around the same time and then see if they were all by a small number of users. Maybe those folks just need to slow down before closing the window/moving to another page?

camrun91 commented 3 years ago

HMMM, @cabal95 I actually think that is exactly where ours are happening however I do not think we are seeing a progress bar, Let me double-check to make sure.

camrun91 commented 3 years ago

@cabal95 I just tested and it is for sure happening on the comm wizard when pulling in a large list. However, I am not getting a progress bar and I do not have to close the page. I can stay on the page and let it finish loading and I still get several of these errors.

JimMichael commented 3 years ago

This makes me wonder if https://github.com/SparkDevNetwork/Rock/commit/3580633d793274b0db91a7527a9ddbb37767b183 has ever been working?

cabal95 commented 3 years ago

@camrun91 Interesting. Can you try opening the Browser Developer Tools and then go to the Network tab/pane, then try the test again? I'd be curious to see what your browser is doing in regards to any /signalr/**** requests.

camrun91 commented 3 years ago

@cabal95 well it does not seem to happen if I literally just sit on the page. But if I hit next, or go back to the group viewer (Where I came from) ti throws several of these errors. Here is a screen shot of the network tab. That top request time keeps growing I am just sitting on the page and it is at 3+ mins now.

The group I am pulling over is really big. 4k+ people. (likely larger than others in our org are doing)

Screen Shot 2021-04-07 at 2 26 27 PM

cabal95 commented 3 years ago

FYI the "connect" request will stay open for a long time. It basically stays open for the entire life of the page. That is how it allows the server to send data to the client after the page has loaded.

leahjennings commented 3 years ago

Received this exception again this morning after creating an email communication sent to a communication list of 9,989 people. After I clicked "Send", I did get the progress bar, but I stayed on the page until I got the final ok message. I then clicked "View Communication". After I clicked that and ultimately left the creation page, I got the exception email.

leahjennings commented 3 years ago

I can pretty easily replicate this with any number of recipients just by creating a new communication, walking through the communication wizard until I get the green completion message and the link to "View Communication". Once I close that window, that's when I get the exception. I say pretty easily, because out of the 7 communications I tested with today, I got the exception email 5 times.

Those who don't have email exceptions turned on would never know this is happening.

I have two .har files (exported the network tab in the web inspector), one .har file is from the time I didn't get the exception. The other .har file is from when I did get the exception. I'm not attaching them here because I don't know if there's any potentially sensitive info contained in them (I've not worked with .har files before). However, if they are needed to troubleshoot further, I'd be happy to email them. Also note the exception doesn't occur until after you close the window, so the export that created the .har files occurred just before the window was closed.

leahjennings commented 3 years ago

I sent two emails Friday to some fairly large lists (about 1500 people each) and had the communication detail page up for each of the two communications. I locked my computer and left for the weekend. When I came in this morning and closed those two windows, I got the exception email 94 times with that error.

This isn't at all affecting functionality, the communication was processed and delivered as normal. It's just a false alarm in that admins are notified of an issue that doesn't seem to be an issue. It also doesn't show in the exceptions list within Rock.

camrun91 commented 3 years ago

I am still getting this on 12.6 @leahjennings are you still seeing this as well.

leahjennings commented 2 years ago

@camrun91 sorry for the delay, we just updated from 12.5 to 12.7 this past week. On 12.7, I can confirm this is still happening. The exception doesn't show in the exception list within the UI, it's only visible if you have it configured to be emailed the exceptions.

smross commented 2 years ago

I can confirm this problem happens for us, on both a 12.2 production system, and also happened on a 13.0 Alpha system.

It does not show in the exception log. It only shows in emailed exceptions.

This is also happening when someone is in the communication wizard.

If we can help replicate the issue, please let me know how I can help

leahjennings commented 2 years ago

We are running v13.4 and just had this happen again. Rock administrators received 89 exception occurred emails for a specific user stating The remote host closed the connection. The error code is 0x80070057. The exception also did not get logged in the Exception List.

mikedotmundy commented 2 years ago

We are on v13.4 and I can confirm that we are still seeing this exception also.

dataCollegechurch commented 2 years ago

@leahjennings - is your impressions that the communication entry wizard is still the source of these exceptions?

leahjennings commented 2 years ago

@dataCollegechurch yes, because we rarely use simple editor (it's disabled entirely except for when someone clicks an email from a person's profile). I've also seen the exception get generated after I've personally interacted with the Communication Entry Wizard.

dataCollegechurch commented 2 years ago

@ethan-sparkdevnetwork - can we reopen this ticket or should a new ticket be created? I am also still seeing this issue and on 13.5

dataCollegechurch commented 1 year ago

@ethan-sparkdevnetwork / @nairdo - just want to circle back and see if you had any thoughts on if this issue can be reopened or a new issue should be created? I am seeing this issue regularly on 14.1

mikedotmundy commented 1 year ago

We are also continuing to see this issue in v14.1. Here is a screenshot of the latest one:

Screenshot 2023-02-23 at 19 02 54
mikedotmundy commented 1 year ago

Continuing to see this issue in v15.1.