aspnet / AspNetCoreModule

ASP.NET Core Module for IIS and IIS Express
Other
115 stars 31 forks source link

ANCM port collision issues #64

Closed tmoeller328 closed 6 years ago

tmoeller328 commented 7 years ago

We have been getting random assortments of these types of messages in the windows event-viewer every morning at the SAME TIME:

FAILURE: Application 'MACHINE/WEBROOT/APPHOST/WMIINVSNAP' with physical root 'C:\inetpub\TGS\WMIInvSnap\' created process with commandline '"C:\inetpub\TGS\WMIInvSnap\SupplyChain.Services.WMIInvSnapshotWeb.exe" ' but failed to listen on the given port '26174'

SUCCESS: Application 'MACHINE/WEBROOT/APPHOST/WMISALESORDER' started process '5052' successfully and is listening on port '26174'. while at the EXACT same time

NOTE please that the port number is the SAME, but the application is different.

Background/history: We have 9 pairs of dotnet core service-apps (deployed through IIS apppools as "No Managed Code") and batch-apps. All 9 batch apps are restarted at the same time every morning and their first communication to their companion service-app of course brings them out of their idle-state.

Obviously we could stagger the restart of the batch-apps to workaround ANCM's incorrect reuse of the same port number for restarts of idle web services that restart at the same time, but this situation could also randomly occur during the day if/when any 2 batch-apps were to get requests at the same time AND their companion web-services both happened to be idle.

Please fix this bug.

Thanks, Tom Moellering (tommoellering@gamestop.com)

FlorianRainer commented 7 years ago

I have noticed the same Problem on an other Usecase.

I have running about 10+ Webs some are different applications, some are the same but a different instance setup. how ever it doesn't matter.

I use the IIS8 Process Activation Feature (PreloadEnabled) for all Webs and the ApplicationPool has StartMode AlwaysRunning. If i use the IISReset cmd command to restart all webs i have the same result.

I get the error message 'Application 'MACHINE/WEBROOT/APPHOST/xyz' with physical root 'C:\inetpub\xyz' created process with commandline '"C:\inetpub\xyz\xyz.exe" ' but failed to listen on the given port 'abcd''

This can be easy reproduced with the default Aspnet MVC Template and IIS (not Express)

I use AspNetCore 1.1.0 with .Net Framework 4.6.2, the ApNetCoreModule released with AspNetCore 1.1.0 and Kestrel 1.1.0.

But i have noticed the same problem already on RC2 and 1.0.0

This Error is not blocking the Website or w3wp process, but the asp.net core exe (subprocess of w3wp) will start and stop again and will not be preloaded. The first request will start a new process on an other port and the request will be processed correctly. But the advantage of an application warmup will disapear.

i have tested with ~30 webs and only 4 have started correctly, all other webs have had this exception.

Tratcher commented 7 years ago

@pan-wang does ANCM use a linear search for available ports? Would it be better if it used a random search?

shirhatti commented 7 years ago

We'll try to seed the random function for generating port numbers https://github.com/aspnet/AspNetCoreModule/blob/dev/src/AspNetCore/Inc/serverprocess.h#L190

VaclavElias commented 7 years ago

Same issue here. We have got 46 ASP.NET Core websites on our server and get these FAILURE mentioned above by @tmoeller328. We would get daily 1-2 domains inaccessible and we need to restart them manually in IIS. Also, if the server is restarted or IIS is restarted we would get 5-10 of these errors as simultaneously many websites try to start... Hope it will be resolved soon!!! :)

shirhatti commented 7 years ago

@VaclavElias From my understanding the issue is that there is just a lot of noise in the event log as multiple app try to start on the same port and have to retry till they can find an available port. I'm missing how this results in a couple of your sites being inaccessible?

shirhatti commented 7 years ago

That being said, we are aware of the issue and do plan to fix it in our next release.

Tratcher commented 7 years ago

@shirhatti there's only a limited number of retries.

FlorianRainer commented 7 years ago

i have as well a lot of websites hosted on the same machine but i never had that they remain unresponsive. but i noticed if you enable stdoutLogEnabled in web.config that there are 2 log files for one restart. probably related to this issue. @VaclavElias if your application will do some heavy work on startup it may can create some kind of lock on some resource or unwanted concurrency?

VaclavElias commented 7 years ago

So when the FAILURE happens, this is what I mean @shirhatti by inaccessible (image attached) till I restart the page (stop and start particular page) through IIS. Restart always helps. I don't think it does any heavy work. It does connect to remote appsettings.json but if it is not available it just ignore is. Also, when IIS is restarted or server rebooted it is compiling probably all the Views. The next test I will do is to precompile all Views so there is no "heavy" work :)

image

emirhosseini commented 7 years ago

@shirhatti Could this not be circumvented by setting an explicit port number for each application by setting the "server.urls" value in a hosting.json file wired up in program.cs? I'm having the same issue and was wondering whether that was the better solution rather than letting it pick a random port number.

FlorianRainer commented 7 years ago

for multiple Instances of the same Application it would be more configuration work, but it could be an option

emirhosseini commented 7 years ago

@FlorianRainer In my case I have only one instance of the application running on a server under IIS on a 2 server setup. So I would think it would be fine in this case. But I see your point about multiple instances on the same server.

emirhosseini commented 7 years ago

It turns out when running the app under IIS the port settings are being ignored. When I run locally while running the exe it uses the port I told it to use. But when it runs under IIS it seems to use whatever port it wants to use. Does anyone know if this is how it's supposed to work? Why can't I set the port it opens up under when running under IIS?

Tratcher commented 7 years ago

@emirhosseini because IIS would have no idea what port to contact you on. In the current design IIS specifies the port and UseIISIntegration reads it from an environment variable.

emirhosseini commented 7 years ago

@Tratcher I did eventually find my way to the "ASPNETCORE_PORT" variable. I tried specifying this value but that is causing the app to crash on startup.

Is there no way to set the port instead of it picking a bad port number and cause port collisions?

Tratcher commented 7 years ago

No, ANCM must pick the port, and it's up to them to make it more reliable.

emirhosseini commented 7 years ago

So what is the purpose of setting the ASPNETCORE_PORT environment variable? I thought that was used by the UseIISIntegration call. Or does UseIISIntegration have nothing to do with ACNM?

So I have no choice but to continue to have intermittent production issues and 10+ second delays while ACNM poorly tries to select a port number that's already in use? This particular call I'm working on only has 10 seconds to respond so this is a huge issue for me. It's not acceptable to say "well it'll eventually find a port number to use".

tmoeller328 commented 7 years ago

My local poor-man’s workaround has been in-place for almost 5 months now and works quite well. From my original issue posting: https://github.com/aspnet/AspNetCoreModule/issues/64

Obviously we could stagger the restart of the batch-apps to workaround ANCM's incorrect reuse of the same port number for restarts of idle web services that restart at the same time, but this situation could also randomly occur during the day if/when any 2 batch-apps were to get requests at the same time AND their companion web-services both happened to be idle. FYI,

p.s. another solution of course is to re-learn C++ and fix it ourselves. Or to use something else besides ANCM: http://stackoverflow.com/questions/42510623/hosting-asp-net-core-in-iis-without-kestrel

-- Best Regards, Tom M.

TomMoellering | ApplicationArchitect | GameStop | p (469) 222-8707 | tommoellering@gamestop.commailto:tommoellering@gamestop.com

From: emirhosseini [mailto:notifications@github.com] Sent: Thursday, May 11, 2017 1:02 PM To: aspnet/AspNetCoreModule AspNetCoreModule@noreply.github.com Cc: Tom Moellering TomMoellering@gamestop.com; Mention mention@noreply.github.com Subject: Re: [aspnet/AspNetCoreModule] ANCM port collision issues (#64)

So what is the purpose of setting the ASPNETCORE_PORT environment variable? I thought that was used by the UseIISIntegration call. Or does UseIISIntegration have nothing to do with ACNM?

So I have no choice but to continue to have intermittent production issues and 10+ second delays while ACNM poorly tries to select a port number that's already in use? This particular call I'm working on only has 10 seconds to respond so this is a huge issue for me. It's not acceptable to say "well it'll eventually find a port number to use".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aspnet/AspNetCoreModule/issues/64#issuecomment-300869690, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASzfDJNaCBvz25e-1a87J8F1vCV8NKpAks5r40z6gaJpZM4Ly3vJ.

emirhosseini commented 7 years ago

@tmoeller328 Are you referring to staggering the restart? I'm already doing that and this is nowhere near the restart time of my apps. It's just in the middle of the day. It seems to randomly decide to kill the app (Not at IIS level) and start listening on a new port. If only I could tell it what port to start up for each app every time rather than it doing a horrible job of "randomly" not so randomly pick a port for me.

It just seems like none of this stuff is production ready. It seems like it should only be used for sandbox.

tmoeller328 commented 7 years ago

Hmm. My guess is that iis is recycling your app during periods of inactivity (default is usually 20 minutes?) You could try this:

https://serverfault.com/questions/333907/what-should-i-do-to-make-sure-that-iis-does-not-recycle-my-application

-- Best Regards, Tom M.

TomMoellering | ApplicationArchitect | GameStop | p (469) 222-8707 | tommoellering@gamestop.commailto:tommoellering@gamestop.com

From: emirhosseini [mailto:notifications@github.com] Sent: Thursday, May 11, 2017 2:00 PM To: aspnet/AspNetCoreModule AspNetCoreModule@noreply.github.com Cc: Tom Moellering TomMoellering@gamestop.com; Mention mention@noreply.github.com Subject: Re: [aspnet/AspNetCoreModule] ANCM port collision issues (#64)

@tmoeller328https://github.com/tmoeller328 Are you referring to staggering the restart? I'm already doing that and this is nowhere near the restart time of my apps. It's just in the middle of the day. It seems to randomly decide to kill the app (Not at IIS level) and start listening on a new port. If only I could tell it what port to start up for each app every time rather than it doing a horrible job of "randomly" not so randomly pick a port for me.

It just seems like none of this stuff is production ready. It seems like it should only be used for sandbox.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aspnet/AspNetCoreModule/issues/64#issuecomment-300885492, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASzfDOAv8A14mBYSKc9Hl0d_Tqhi3RqRks5r41qkgaJpZM4Ly3vJ.

emirhosseini commented 7 years ago

@tmoeller328 Thanks but I've already set all the 'stay alive' settings for IIS. This is not an IIS issue...

shirhatti commented 7 years ago

@emirhosseini I understand your frustration around the fact ANCM was rather inefficient and attempted to start multiple applications with the same port. It is an omission we've acknowledged and I promise you we will fix it in the next release of ANCM.

So what is the purpose of setting the ASPNETCORE_PORT environment variable? I thought that was used by the UseIISIntegration call. Or does UseIISIntegration have nothing to do with ACNM?

The ASPNETCORE_PORT is set by the IIS Worker Process is environment block of the child process it spawns as a way of communicating what port it should listen on. It isn't intended to be set by the user. Yes, it is used by UseIISIntegration() call. t's an implementation detail you've stumbled upon, it wasn't designed as a way for a user to specify a port.

So I have no choice but to continue to have intermittent production issues and 10+ second delays while ACNM poorly tries to select a port number that's already in use? This particular call I'm working on only has 10 seconds to respond so this is a huge issue for me. It's not acceptable to say "well it'll eventually find a port number to use".

Could you provide more information for us to help you here? I understand the current behavior is terrible when you cold start-up all your ASP.NET Core simultaneously, but besides that I'm struggling to see how this behavior if affecting you.

It's just in the middle of the day. It seems to randomly decide to kill the app (Not at IIS level) and start listening on a new port. If only I could tell it what port to start up for each app every time rather than it doing a horrible job of "randomly" not so randomly pick a port for me.

ANCM doesn't randomly kill the backend process. The backend process is gracefully shutdown when the worker process attempts to go away. If you have verified that that it doesn't have to do with your w3wp idle time-out settings or that your app is crashing, then it's possible you've identified a bug. If you can provide us a repro, I'd appreciate that.

Is there no way to set the port instead of it picking a bad port number and cause port collisions?

That sounds like a feature request from you, and I'd love to give this a chat with rest of the team. Is there any reason why you would want to do that once we fix the port collision issue? Ideally we like to avoid having additional switches and knobs for you to configure with if there's no tangible benefit.

emirhosseini commented 7 years ago

ANCM doesn't randomly kill the backend process. The backend process is gracefully shutdown when the worker process attempts to go away. If you have verified that that it doesn't have to do with your w3wp idle time-out settings or that your app is crashing, then it's possible you've identified a bug. If you can provide us a repro, I'd appreciate that.

I can see that it's not a w3wp issue because I see no event logs on the box that indicate any sort of an IIS recycle event. The idle timeout is set to 0, the site is set to be always running and the start times are staggered between the 2 apps. Yet I see logs in event viewer that show ACNM starting the app back up on a new port. Is there any other reason besides what you mentioned that would cause it to shut down? I read that if it crashed that would cause it to restart. But I see no logs indicating of a crash so I can't tell what happened.

I don't know what kind of repro I would give you. This is a simple asp.net core app publishing to the full .net framework net461.

That sounds like a feature request from you, and I'd love to give this a chat with rest of the team. Is there any reason why you would want to do that once we fix the port collision issue? Ideally we like to avoid having additional switches and knobs for you to configure with if there's no tangible benefit.

That depends on how the port collision issue will be fixed. If it'll still potentially take your code multiple attempts to find a good port to use then that's no good. If you're going to know somehow what ports are in use and not use those ports then I could see that working reliably. Otherwise if you're using a random number generator then that could still fail obviously. In which case you may as well just let us set the port!

shirhatti commented 7 years ago

That depends on how the port collision issue will be fixed. If it'll still potentially take your code multiple attempts to find a good port to use then that's no good. If you're going to know somehow what ports are in use and not use those ports then I could see that working reliably. Otherwise if you're using a random number generator then that could still fail obviously. In which case you may as well just let us set the port!

Besides noisy logs, is this an issue for you? There is no back-off time so if there is a port collision it should try again instantly and will probably succeed. I don't know why you are seeing 10 seconds for the process to start. @pan-wang Do you have any ideas what's going on here?

emirhosseini commented 7 years ago

Yes it is because I only have 10 seconds to reply before a timeout is forced on me by the caller. In the case that I was assigned to investigate there was a 6 second gap between the following 2 entries in the event logs! So not instantaneous. This is unacceptable for my scenario.

Application 'appname' with physical root 'path' created process with commandline '"executable" ' but failed to listen on the given port '3474'

Application 'appname' started process '4720' successfully and is listening on port '3491'.

Tratcher commented 7 years ago

How long does it take your app to start up in a non-collision scenario?

FlorianRainer commented 7 years ago

maybe the overlapped recycling is not working correct if the collision happens? normaly even a asp.net core app with high startup time will not produce timeouts or errors, at least not at my testcase. maybe a little longer response time but not much.

but maybe if the collision happens the old app will not await the resatrt of a new app after collision?!

emirhosseini commented 7 years ago

@Tratcher Any ideas as to how I would be able to tell? There's no log entry before it so I can't tell how long it actually takes.

emirhosseini commented 7 years ago

I just saw this log entry too. Not sure if it has to do with this issue.

Description: The process was terminated due to an unhandled exception. Exception Info: Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvException at Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv.tcp_getsockname(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvTcpHandle, Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.SockAddr ByRef, Int32 ByRef) at Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvTcpHandle.GetSockIPEndPoint() at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.TcpListener.CreateListenSocket() at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Listener.b__8_0(System.Object)

Exception Info: System.AggregateException at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean) at System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken) at Microsoft.AspNetCore.Server.Kestrel.Internal.KestrelEngine.CreateServer(Microsoft.AspNetCore.Server.Kestrel.ServerAddress) at Microsoft.AspNetCore.Server.Kestrel.KestrelServer.Start[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting, Version=1.1.1.0, Culture=neutral, PublicKeyToken=adb9793829ddae60]

Exception Info: System.IO.IOException at Microsoft.AspNetCore.Server.Kestrel.KestrelServer.Start[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting, Version=1.1.1.0, Culture=neutral, PublicKeyToken=adb9793829ddae60] at Microsoft.AspNetCore.Hosting.Internal.WebHost.Start() at Microsoft.AspNetCore.Hosting.WebHostExtensions.Run(Microsoft.AspNetCore.Hosting.IWebHost, System.Threading.CancellationToken, System.String) at Microsoft.AspNetCore.Hosting.WebHostExtensions.Run(Microsoft.AspNetCore.Hosting.IWebHost) at IBI.Api.SecurityService.Program.Main(System.String[])

borgdylan commented 7 years ago

Please provide a way to statically assign ports. On Azure App Service this is making some sites point to the kestrel process of other making the wrong site load up when there are collisions. This is more serious than noisy log files.

borgdylan commented 7 years ago

ping @shirhatti

FlorianRainer commented 7 years ago

as far as i know with asp.net core 2.x there would be a option for "in process hosting" again. if this would not use the tcp connection between ANCM and Kestrel it would solve this issue, correct?

borgdylan commented 7 years ago

Yes, but it should allow hosting using full CLR not just CoreCLR.

borgdylan commented 7 years ago

Also, why not allow a user to statically assign ports? ANCM already has configuration points. So it should not be that hard. Also it is a great mitigation measure while the dynamic port code gets to a point where it is perfect.

Tratcher commented 7 years ago

@borgdylan that's even more fragile. The port would need to be in your web.config and deployed with your app. The actual ports in use vary from machine to machine, and over time, especially somewhere like azure. You'd have to have full control of your production environment to avoid conflicting with something else.

borgdylan commented 7 years ago

Could ANCM be made to try listening on thr dynamic port before starting kestrel to make sure it is not in use by another kestrel instance? That would be another solution. The current code is sometimes making one app service site point to the kestrel instance of another. Once a client doubted our choice of using the Azure cloud and it was hard to keep them convinced. A solution is sorely needed. We need a guarantee that one ANCM instance will not go through with using a port that another running instance had chosen.

Tratcher commented 7 years ago

Having ANCM check the port first is something we're already considering.

pan-wang commented 7 years ago

ANCM code does check whether the targeted port is in use before creating the Kestrel process. It calls windows API to do look up. Unfortunately this Windows API is not available in some environment. In such case, ANCM has to randomly pick up one and assign it to Kestrel and may hit port collision.

borgdylan commented 7 years ago

Is the collision prevention check possible/being done on App Service?

muratg commented 6 years ago

Assigning to @shirhatti who will create an uber-issue related to port-collision issues and close all the filed bugs like this.

shirhatti commented 6 years ago

I'm closing this bug to inactivity.

There is an unsupported way to force a static port assignment by setting the ASPNETCORE_PORT environment variable. Feel free to use this if you're can guarantee that the ephemeral port you chose will always be available.

If you continue to issues around port collision feel free to open a new bug