dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.35k stars 9.99k forks source link

IOException - .net core 2.2 app fails to start on Azure App Service #18578

Closed piotrzbyszynski closed 4 years ago

piotrzbyszynski commented 4 years ago

Hi,

after migrating my app from web api classic to web api core 2.2 my app fails to start on Azure App Service (locally it works just fine)

The exception message: An error occurred while writing to logger(s). (The handle is invalid) The handle is invalid

Failed method: System.ConsolePal+WindowsConsoleStream.Write

Stack Trace:

System.AggregateException:
   at Microsoft.Extensions.Logging.Logger.Log (Microsoft.Extensions.Logging, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.Extensions.Logging.Logger`1.Microsoft.Extensions.Logging.ILogger.Log (Microsoft.Extensions.Logging.Abstractions, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.AspNetCore.Hosting.Internal.HostingApplicationDiagnostics.LogRequestStarting (Microsoft.AspNetCore.Hosting, Version=2.2.7.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.AspNetCore.Hosting.Internal.HostingApplication.CreateContext (Microsoft.AspNetCore.Hosting, Version=2.2.7.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.AspNetCore.Server.IIS.Core.IISHttpContextOfT`1+<ProcessRequestAsync>d__2.MoveNext (Microsoft.AspNetCore.Server.IIS, Version=2.2.6.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
Inner exception System.IO.IOException handled at Microsoft.Extensions.Logging.Logger.Log:
   at System.ConsolePal+WindowsConsoleStream.Write (System.Console, Version=4.1.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.IO.StreamWriter.Flush (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.IO.StreamWriter.Write (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.IO.SyncTextWriter.Write (System.Console, Version=4.1.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at Microsoft.Extensions.Logging.Console.Internal.WindowsLogConsole.Write (Microsoft.Extensions.Logging.Console, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.Extensions.Logging.Console.Internal.ConsoleLoggerProcessor.WriteMessage (Microsoft.Extensions.Logging.Console, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.Extensions.Logging.Console.Internal.ConsoleLoggerProcessor.EnqueueMessage (Microsoft.Extensions.Logging.Console, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.Extensions.Logging.Console.ConsoleLogger.WriteMessage (Microsoft.Extensions.Logging.Console, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.Extensions.Logging.Console.ConsoleLogger.Log (Microsoft.Extensions.Logging.Console, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.Extensions.Logging.Logger.Log (Microsoft.Extensions.Logging, Version=2.2.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)

Also, in Azure application event logs I can see following errors:

Could not load configuration. Exception message: Unexpected exception: Configuration load exception has occured
Process Id: 12560.
File Version: 12.2.19169.6. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: e7f262e33108e92fc8805b925cc04b07d254118b

Application '/LM/W3SVC/562901051/ROOT' with physical root 'D:\home\site\wwwroot\' failed to load clr and managed application. Unexpected exception: HRESULT 0x8007023e returned at d:\b\w\da744fbcc13abce\src\servers\iis\aspnetcoremodulev2\inprocessrequesthandler\inprocessapplication.cpp:355 
Process Id: 12560.
File Version: 12.2.19169.6. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: e7f262e33108e92fc8805b925cc04b07d254118b

Failed to start application '/LM/W3SVC/562901051/ROOT', ErrorCode '0x8007023e'.
Process Id: 12560.
File Version: 13.0.19218.0. Description: IIS ASP.NET Core Module V2. Commit: 4a42afc5aea63750638e118560d43db04bd9ccc2

Any help would by highly appreciated

Tratcher commented 4 years ago

@jkotalik a possible issue with the in-proc console output?

analogrelay commented 4 years ago

@pzbyszynski 2.2 is end-of-life and no longer supported. Does the issue reproduce on 3.0 or 3.1 (the releases which are still in support)

ghost commented 4 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.

piotrzbyszynski commented 4 years ago

@anurse Here is what is happening:

I have completely no idea what is happening. I see no exceptions and virtually no traffic in application insights. At the same time, I can not access my app. It seems like no traffic is directed to the .net core app after a swap

I have also set up completely new environment with my .net core app behind a traffic manager. I can access this app via traffic manager and directly. The traffic manager reports endpoint as online. But as soon as I redirect my main domain to traffic manager the same story happens. I can not access my app (directly and via traffic manager) I see no traffic & exceptions in application insights, and traffic manager reports my site as degraded.

This is very frustrating, it makes our app inaccessible for our users during swaps, prevents us from releasing new versions and hurts our business. With 90K active users, it is a bit of a problem as you can imagine. So any help would be highly appreciated

piotrzbyszynski commented 4 years ago

Also, I have found the following messages in an application event log (maybe it will be helpful)

Error: CInstrumentationMethod::Initialize - failed to initialize instrumentation method PID: 9180 hr: 80131375 name: Production Breakpoint Instrumentation Method

Followed by info: Incoming-calls pipe shutdown.

analogrelay commented 4 years ago

@pzbyszynski Do you have the stdout log feature enabled in ASP.NET Core Module? Does the issue reproduce if you disable it? Or if you try the out-of-proc mode in ANCM? The stdout log isn't really designed as a logging system for use under high load in production systems (though I don't see why it shouldn't work).

  • I have swapped slots, so all the traffic was routed to .NET Core 3.0 app and the app died

So, to be clear, the issue only occurs when under load?

Also, I have found the following messages in an application event log (maybe it will be helpful)

What are these being logged by? What's the event source, etc.?

It seems like something is corrupting the Stdout File Handle, which is breaking System.Console. We do mess with that in the in-proc ANCM handler in order to capture the stdout log so there might be something going on (that's why I suggested disabling that). To move forward with debugging this, we'll need to get something more complicated like a process dump. Be aware that process dumps contain the entire memory content of your application so you shouldn't share production process dumps in public forums like this. Azure/Microsoft Support is better equipped to handle interactive dump debugging.

We can't repro this in our environments

  • I have upgraded my app to .NET Core 3.0

Also, note that 3.0 support is ending soon, I'd suggest migrating to 3.1 (which is a long-term support release). The migration path is very simple and should only require minor (if any) changes to your app.

piotrzbyszynski commented 4 years ago

@anurse

@pzbyszynski Do you have the stdout log feature enabled in ASP.NET Core Module? Does the issue reproduce if you disable it? Or if you try the out-of-proc mode in ANCM? The stdout log isn't really designed as a logging system for use under high load in production systems (though I don't see why it shouldn't work).

I have to investigate, but I doubt we use any of these. I will get back to you with this info

So, to be clear, the issue only occurs when under load?

I can not be sure, but this is exactly how it looks like

What are these being logged by? What's the event source, etc.?

I can access this logs via App Service -> Diagnose and solve problems -> Diagnostic tools -> Application event log on Azure Portal

CInstrumentationMethod::Initialize - failed to initialize instrumentation method PID: 9180 hr: 80131375 name: Production Breakpoint Instrumentation Method is logged by Instrumentation Engine

Incoming-calls pipe shutdown. is logged by Production Brakepoints

P.S. We will upgrade to .net core 3.1 in the nearest feature

piotrzbyszynski commented 4 years ago

@anurse I can confirm that we do not use stdout log

Here is web.config fragment:

<aspNetCore processPath="dotnet" arguments=".\PerfectGym.GoApi.dll" stdoutLogEnabled="false" stdoutLogFile=".\logs\stdout" hostingModel="InProcess" />
analogrelay commented 4 years ago

CInstrumentationMethod::Initialize - failed to initialize instrumentation method PID: 9180 hr: 80131375 name: Production Breakpoint Instrumentation Method is logged by Instrumentation Engine

Incoming-calls pipe shutdown. is logged by Production Brakepoints

Ok, I think these aren't related then.

jkotalik commented 4 years ago

@pzbyszynski let us know if upgrading to 3.1 fixes this issue!

ghost commented 4 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.

piotrzbyszynski commented 4 years ago

@jkotalik I have upgraded to 3.1. I do not see the original error but I still can not get my app up and running on .NET Core & Azure. The app works under very light load (when tested on slot or when excluded from traffic manager), but as soon it starts receiving production traffic it dies.

I have turned on and off all the logging I know about (stdout, diagnostic logging, app service logs, application insights) and I do not see any meaningful errors. I did several memory dumps and profiler traces - nothing. I even set up a new app service plan in a different region, without any luck. I have engaged MS support, also without luck. After several seconds under production load app just dies, it just stops responding (like it stoped receiving traffic). It is super frustrating, and to be honest I have run out of ideas

analogrelay commented 4 years ago

@pzbyszynski Without seeing more of your application, we can't really help diagnose an issue like this (and in general Microsoft Product Support is the best way to get more in-depth help diagnosing performance issues in your app; feel free to let me know any issues you had when you contacted them)

koskila commented 4 years ago

Just chiming in - we're experiencing a similar error with an API application on .NET Core 3.1. First, when an app gets some traffic, we get an error along the lines of:

CInstrumentationMethod::Initialize - failed to initialize instrumentation method PID: 11280 hr: 80131375 name: Production Breakpoint Instrumentation Method

Source is "Instrumentation Engine"

After this, the app freezes, finally clients experience 502 Bad Gateway errors, and the app recycles and is back up again.

Not sure if it's the same issue exactly, but definitely sounds similar?

This has been reported to Microsoft Azure Support, and I'm waiting to hear back from them.

analogrelay commented 4 years ago

I believe those indicate an issue with the Azure monitoring components, so Azure Support is probably the right place to ask for assistance.

piotrzbyszynski commented 4 years ago

@koskila @anurse I made some progress on this issue. 2 things that helped for me:

ServicePointManager.DefaultConnectionLimit = 512;
System.Threading.ThreadPool.GetMaxThreads(out int _, out int completionThreads);
System.Threading.ThreadPool.SetMinThreads(512, completionThreads);

Based on this article (https://docs.microsoft.com/en-us/archive/blogs/vancem/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall) I assumed that the problem is thread pool starvation.

I have checked my code, and in fact, I had few calls to async methods inside synchronous code, like so:

Task.Run(() => someMethodAsync()).Result;

In my mind, it should not cause thread blocking, but I have changed all my code to use proper async / await all the way down. It didn't help much.

Bottom line is, that my app that serves 90K users before migration run on .NET Classic on 3 x S2 Azure App Service plan with plenty of headroom, and average response time < 200ms. Now, the very same app runs on .NET Core on 6 x S2 App Service plan with average response time ~400ms. So now, I am trying to figure out how to use PerfView to track down performance issues.

@anurse @jkotalik if you have any suggestions or ideas just drop me a line, please

piotrzbyszynski commented 4 years ago

One more thing, from Diagnose and solve problems -> Availability and performance report, I can see lots of HTTP 400 (604) - Client Gone (The client that originated the request seems to have closed the connection before the response was received!) errors (reported by load balancer). What is interesting, if I scale App Service plan down, I get more of these and app availability drops, but processor load is basically always around 50-60% - that is also why I suspect thread pool starvation

piotrzbyszynski commented 4 years ago

I managed to solve my problem. As I suspected it was caused by thread pool exhaustion.

The solution is using await / async all the way down. In .NET classic, it was often not possible, but in .NET Core it is simply a must, so refactor your code when porting to .NET Core. If you don't you probably won't even notice under moderate load, but under heavy load, it will bite your head off...

analogrelay commented 4 years ago

The solution is using await / async all the way down. In .NET classic, it was often not possible, but in .NET Core it is simply a must, so refactor your code when porting to .NET Core.

Yep, this is something we cover in our Performance Guide which I'd suggest taking a look at. Blocking calls in an async path are definitely problematic.

ghost commented 4 years ago

Thank you for contacting us. Due to a lack of activity on this discussion issue we're closing it in an effort to keep our backlog clean. If you believe there is a concern related to the ASP.NET Core framework, which hasn't been addressed yet, please file a new issue.

This issue will be locked after 30 more days of inactivity. If you still wish to discuss this subject after then, please create a new issue!