Open sgraziosi opened 4 years ago
@sgraziosi Before going any further (i.e., asking the product unit), timestamp the "is {starting | working | stopping}" hosted service messages ... and implement IHostApplicationLifetime and timestamp those messages. I recall seeing a similar situation, and the sequence of log entries was unreliable for determining the order in which the events were occurring. Confirm with timestamps the sequence first, and note that the framework's own logging might still not appear in the exact order of actual events. I'll mark this issue on hold for a bit while you try that and report back on the outcome.
@guardrex, thanks for the super speedy answer!
I'll try my luck and point out that the order doesn't matter much: in the sense that if you invert any pair of messages, the diagnosis doesn't change. The patient still seems ill.
I can easily add a DateTime.Now.ToShortTimeString() +
at the start of the various Console.WriteLine(...)
messages, but I can guarantee that it's a waste of time, nothing substantial will change. I can be confident, because, especially in situations like this:
MyApp> Timed Background Service is working, with ID: 1221835851
MyApp> Timed Background Service is working, with ID: 1930822523
MyApp> Timed Background Service is working, with ID: 1221835851
MyApp> Timed Background Service is working, with ID: 1930822523
(when two instances coexist) I have consistently seen that the lines for each instance appear every 5 seconds (as expected). I spent so much time looking at this output and trying to figure out what's wrong, so please trust me on this.
Implementing IHostApplicationLifetime seems like a good idea and I'll give it a try.
Meanwhile, I think the biggest risk we have is that whoever might take this issue on may be unable to replicate the behaviour, so I'd recommend (if I may, and only if I may) to pass this issue on to try a straightforward replication. It's very possible that it will fail, in which case we'll have a bigger problem. I suspect replication might fail because I initially could not "reach" the AddHostedService()
method, which suggests something was odd from the very start.
Sometimes, we send these types of issues directly to the product unit's repo. I think this is still a content issue for the topic at this point. One of MS's 🥇 top engineers 🥇 worked on this topic with me, and you can rest assured that it's either correct (or mostly correct 🙈😄). It might need to address this situation on the actual timing of events.
The topic illustrates the use of a logger (not console writes), and that's what I referring to about my vague recollection on the sequence of indicated events in relation to actual events .
In any case ... even if I'm not recalling that correctly, let's ping @Tratcher 🥇 for advice. We might need to address this in the topic in some way to address your confusion if the sequence of messages can't be trusted. If he thinks that this should be addressed as an engineering issue, he'll move it over to their repo for more discussion/work.
Thanks again @guardrex - I'm going to try something with IHostApplicationLifetime
in the meantime. Also: yes, well spotted; the ILogger<TimedHostedService>
implementation on the page might indeed mean that messages/logs get delayed, I never spotted the timestamps going out of sequence, but I have seen delays (where you know something has been logged and need to wait for the logger to actually write to file). I can add that it's one reason why I went straight to the Console.
On the other hand, I can insist that the "ordering" isn't the issue here (not for me, might be a legitimate concern for you!). It's what gets reported that matters...
OK, here we go...
New code:
internal class TimedHostedService : IHostedService, IDisposable
{
private Timer _timer;
private readonly int ID = new Random().Next();
private readonly Microsoft.AspNetCore.Hosting.IApplicationLifetime _appLifetime;
public TimedHostedService(Microsoft.AspNetCore.Hosting.IApplicationLifetime appLifetime)
{
_appLifetime = appLifetime;
}
public Task StartAsync(CancellationToken cancellationToken)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss:ff") + " Timed Background Service is starting, with ID: " + ID );
_appLifetime.ApplicationStarted.Register(OnStarted);
_appLifetime.ApplicationStopping.Register(OnStopping);
_appLifetime.ApplicationStopped.Register(OnStopped);
_timer = new Timer(DoWork, null, TimeSpan.Zero, TimeSpan.FromSeconds(5));
return Task.CompletedTask;
}
private void DoWork(object state)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss:ff") + " Timed Background Service is working, with ID: " + ID);
}
public Task StopAsync(CancellationToken cancellationToken)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss:ff") + " Timed Background Service is stopping, with ID: " + ID);
_timer?.Change(Timeout.Infinite, 0);
return Task.CompletedTask;
}
public void Dispose()
{
_timer?.Dispose();
}
private void OnStarted()
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss:ff") + " Timed Background Srv OnStarted(), with ID: " + ID);
}
private void OnStopping()
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss:ff") + " Timed Background Srv OnStopping(), with ID: " + ID);
}
private void OnStopped()
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss:ff") + " Timed Background Srv OnStopped(), with ID: " + ID);
}
}
New activity:
MyApp> 17:49:52:39 Timed Background Service is starting, with ID: 89572786
MyApp> 17:49:52:39 Timed Background Srv OnStarted(), with ID: 89572786
MyApp> 17:49:52:40 Timed Background Service is working, with ID: 89572786
MyApp> Hosting environment: Development
MyApp> Content root path: [path]\MyApp
MyApp> Now listening on: http://localhost:47213
MyApp> Application started. Press Ctrl+C to shut down.
MyApp> 17:49:57:41 Timed Background Service is working, with ID: 89572786
MyApp> 17:50:02:40 Timed Background Service is working, with ID: 89572786
[...]
MyApp> 17:50:42:40 Timed Background Service is working, with ID: 89572786
[Recycle 1]
MyApp> 17:50:42:96 Timed Background Service is starting, with ID: 807768962
MyApp> 17:50:42:97 Timed Background Srv OnStarted(), with ID: 807768962
MyApp> 17:50:42:97 Timed Background Service is working, with ID: 807768962
MyApp> Hosting environment: Development
MyApp> Content root path: [path]\MyApp
MyApp> Now listening on: http://localhost:18443
MyApp> Application started. Press Ctrl+C to shut down.
MyApp> 17:50:45:41 Timed Background Srv OnStopping(), with ID: 807768962
MyApp> 17:50:45:46 Timed Background Service is stopping, with ID: 807768962
MyApp> 17:50:45:46 Timed Background Srv OnStopped(), with ID: 807768962
MyApp> Application is shutting down...
MyApp> 17:50:47:40 Timed Background Service is working, with ID: 89572786
MyApp> 17:50:52:41 Timed Background Service is working, with ID: 89572786
MyApp> 17:50:57:40 Timed Background Service is working, with ID: 89572786
MyApp> 17:51:02:40 Timed Background Service is working, with ID: 89572786
MyApp> 17:51:07:40 Timed Background Service is working, with ID: 89572786
[Recycle 2]
MyApp> 17:51:11:19 Timed Background Service is starting, with ID: 131126396
MyApp> 17:51:11:19 Timed Background Srv OnStarted(), with ID: 131126396
MyApp> 17:51:11:19 Timed Background Service is working, with ID: 131126396
MyApp> Hosting environment: Development
MyApp> Content root path: [path]\MyApp
MyApp> Now listening on: http://localhost:20854
MyApp> Application started. Press Ctrl+C to shut down.
MyApp> 17:51:12:40 Timed Background Service is working, with ID: 89572786
MyApp> 17:51:16:21 Timed Background Service is working, with ID: 131126396
[...]
MyApp> 17:51:27:40 Timed Background Service is working, with ID: 89572786
MyApp> 17:51:31:20 Timed Background Service is working, with ID: 131126396
MyApp> 17:51:36:20 Timed Background Service is working, with ID: 131126396
[...]
MyApp> 17:52:06:20 Timed Background Service is working, with ID: 131126396
[Recycle 3]
MyApp> 17:52:07:81 Timed Background Service is starting, with ID: 1820389817
MyApp> 17:52:07:81 Timed Background Srv OnStarted(), with ID: 1820389817
MyApp> 17:52:07:81 Timed Background Service is working, with ID: 1820389817
MyApp> Hosting environment: Development
MyApp> Content root path: [path]\MyApp
MyApp> Now listening on: http://localhost:18676
MyApp> Application started. Press Ctrl+C to shut down.
MyApp> 17:52:11:20 Timed Background Service is working, with ID: 131126396
MyApp> 17:52:12:82 Timed Background Service is working, with ID: 1820389817
MyApp> 17:52:16:21 Timed Background Service is working, with ID: 131126396
MyApp> 17:52:17:82 Timed Background Service is working, with ID: 1820389817
[...]
MyApp> 17:53:27:82 Timed Background Service is working, with ID: 1820389817
MyApp> 17:53:31:19 Timed Background Service is working, with ID: 131126396
MyApp> 17:53:32:82 Timed Background Service is working, with ID: 1820389817
MyApp> 17:53:37:82 Timed Background Service is working, with ID: 1820389817
MyApp> 17:53:42:82 Timed Background Service is working, with ID: 1820389817
I've skipped with [...]
the repetitive bits, for convenience. I did not spot any inconsistency with the timing of events.
This bit is interesting:
MyApp> 17:50:45:41 Timed Background Srv OnStopping(), with ID: 807768962
MyApp> 17:50:45:46 Timed Background Service is stopping, with ID: 807768962
MyApp> 17:50:45:46 Timed Background Srv OnStopped(), with ID: 807768962
Instance 89572786 was expected to self destruct, instead 807768962 does, also executes "OnStopping()" and "OnStopped()". I would have expected 89572786 to execute these (and to self destruct). After that, it all goes downhill. OnStarted() happens when and where you'd expect, but OnStopping/ed never happen again...
Made me notice that also the "Application is shutting down..." message stops appearing (which isn't done by code I wrote).
I hope this helps! Will resume tomorrow with many thanks.
The absence of "Application is shutting down..." implies the application isn't shutting down gracefully, it's being forcefully terminated. That would prevent any graceful StopAsync logic from running.
@shirhatti @jkotalik are IIS recycles supposed to gracefully shut down or abort? (2.1 out-of-proc)
@Tratcher here is what I think, which comes from a summary "outsider" understanding:
MyApp> 17:49:52:40 Timed Background Service is working, with ID: 89572786
MyApp> Hosting environment: Development
MyApp> Content root path: [path]\MyApp
MyApp> Now listening on: http://localhost:47213
MyApp> Application started. Press Ctrl+C to shut down.
MyApp> 17:49:57:41 Timed Background Service is working, with ID: 89572786
MyApp> 17:50:02:40 Timed Background Service is working, with ID: 89572786
[...]
MyApp> 17:50:42:40 Timed Background Service is working, with ID: 89572786
[Recycle 1]
MyApp> 17:50:42:96 Timed Background Service is starting, with ID: 807768962
MyApp> 17:50:42:97 Timed Background Srv OnStarted(), with ID: 807768962
MyApp> 17:50:42:97 Timed Background Service is working, with ID: 807768962
MyApp> Hosting environment: Development
MyApp> Content root path: [path]\MyApp
MyApp> Now listening on: http://localhost:18443
MyApp> Application started. Press Ctrl+C to shut down.
MyApp> 17:50:45:41 Timed Background Srv OnStopping(), with ID: 807768962
MyApp> 17:50:45:46 Timed Background Service is stopping, with ID: 807768962
From 17:49:52:40 to 17:50:42:40 we have the app running normally, there is one instance of the app, and one corresponding instance of the timed service (89572786). Then a recycle happens. IIS does this by (a) creating a new instance of the App (generating a new timed service "called" 807768962 in this particular execution), which will receive all new requests. (b) asks the old instance to shut down gracefully and waits. (c) After a timeout, IIS kills the old instance, if it didn't self-destruct already. The trace shows what's wrong: the new timed service (807768962) receives the "please shut down gracefully" signal, even if it was (presumably) issued to the old instance of the App. [I could use the same "Random ID" trick to verify which instance of the app is shutting down, if it helps.] Thus, within the app whatever mechanism is responsible to notify all the IHostedServices that they should quit is finding those that belong to the wrong instance - this looks real bad and it's what convinced me to get in touch. In other words, it seems that MyApp does the wrong thing when (b) happens. After that, things get worse as (b) seem to stop working altogether (MyApp doesn't even receive or react to it) and the graceless kill in (c) becomes the norm.
Why this happens is a mystery I'm not equipped to solve. Two topmost hypotheses I have (in no order): H1: There is something wrong in how MyApp is implemented/configured, causing the between-instances crosstalk. If this is the case, I need guidance to learn how do I find what I did wrong. H2: There is a bug in how ASP.NET Core finds the IHostedServices to tell them to shut down gracefully. In which case you (/Microsoft) need(/s) to know, so to fix it.
Thus, for me, priority would be to distinguish between H1 and H2...
Transferring this to aspnetcore for investigation.
Hello all (@Tratcher, @BrennanConroy), Just a short one to say:
Update from me: While busy doing other plans, I've found that what I did to move from DotNet Core 2.0 to 2.1 did not suffice. I had changed the "Target Framework" from 2.0 to 2.1 in the Project properties, Application section (GUI). It turns out that this wasn't quite enough: suddenly I could not build in "release" mode, and got this error:
NETSDK1061: The project was restored using Microsoft.NETCore.App version 2.1.6, but with current settings, version 2.0.9 would be used instead. To resolve this issue, make sure the same settings are used for restore and for subsequent operations such as build or publish. Typically this issue can occur if the RuntimeIdentifier property is set during build or publish but not during restore. For more information, see https://aka.ms/dotnet-runtime-patch-selection.
Resolution that worked for me is: https://github.com/dotnet/sdk/issues/2682#issuecomment-458511067
And specifically, the manual addition of <RuntimeFrameworkVersion>2.1.5</RuntimeFrameworkVersion>
element in the .project file.
Thus, I inferred that perhaps this problem (present issue: #22831) might have been due to some framework mismatch, so I've re-visited the branch where I did the experiments reported here. Indeed, things changed, but sadly the overarching problem remains.
What changed: Before, it seemed that the graceful shutdown was happening only once, and subsequent recycle events would not reach the "graceful shutdown" methods. This appeared to be connected with the fact that the graceful restart was triggered on the wrong instance... After sorting out the dotnet core version, it seems that "graceful shutdowns" never get triggered in the wrong (new, just created) instance and appeared to work one time, fail once, work again, etc (with the code I've posted above). 50% success rate is much better than never; thus, I tried to change the timer service so that it would do something (minimally) useful to us. Sadly, this attempt failed and I couldn't even hit the 50% success rate. This to me indicates that indeed something odd is happening, at a level that is too deep for me to identify - sorry!
Thanks for contacting us.
We're moving this issue to the Next sprint planning
milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.
We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.
Hello, I'm trying to achieve something complex, which forced me to look under the hood, only to discover that things don't work as expected (or advertised).
My final aim is beyond the scope of this issue, but will describe it to give you some context. We have a two-clients app, which uses CSLA business objects. Old version is phasing out, based on Silverlight. Sometimes, we need to run procedures that can last long (minutes, even 30 or more). In the old version, we do this via
HostingEnvironment.QueueBackgroundWorkItem(cancellationToken => MethodThatCouldTakeLong(cancellationToken));
. If cancellation is happening (IIS recycling the App Pool) this gives the method a chance to terminate gracefully in mid-flight, which is good!The newer version of the App uses DotNet Core 2.1, MVC and an Angular client, in between, we use the same CSLA business object (we share the same code, with occasional changes when DotNet standard and Core differ, done via compiler directives).
`HostingEnvironment.QueueBackgroundWorkItem()
isn't available in DotNet Core, so I looked intoIHostedService
to achieve the same aim: let the long lasting method "learn" when the overall app is shutting down, so to cancel it gracefully (and hopefully, by using "compiler directives" as little as possible).In order to succeed, I need two things: [1] A system that allows me to run some code only when the app is trying to shut down (the StopAsync method). [2] A system to "reach" the business object so that it "learns" that [1] happened.
The
IHostedService
feature promises to give me [1], but I can't make it work. I actually implemented [2] in many ways, only to eventually discover that it wasn't working because [1] wasn't happening. Ignoring problem [2], here is what I see about [1], using a minimal implementation, based on theTimedHostedService
example.This is marginally different from the example code: I initialise with an (int)ID, to follow the evolution of concurrent instances and I log to Console, so that I can see in realtime what happens via the Output window in VS ("ASP.NET Core Web Server" channel). I register the TimedHostedService like this:
(Side note: it took me I don't know how long to make the compiler find the AddHostedService method, and I have no idea of how I've finally made it work.) I have configured VS to start the app using my local IIS, and the relevant app pool to recycle every 7 requests, so that I can experiment with the TimedHostedService behaviour (by asking 7 times to login with the wrong password from the client, so by generating exactly 7 requests, whenever I want and only when I want).
What I would expect, from a standstill (my comment in square brackets):
What I get is very different and, if I read it correctly, it means that this system isn't working and I can't use it. Perhaps I'm doing something wrong, but I have no idea of where to start looking, as this implementation is the simplest I could find.
Actual output (in square brackets are my comments):
What we're seeing here appears to show the following: 1st recycle - StopAsync is triggered for the wrong (newer) instance 1316790599, which stops gracefully, but should not have stopped. Instance 921976706 remains alive, but should have been stopped. 2nd recycle - 921976706 is still alive. 1886756708 starts, and then once gain StopAsync is triggered for the wrong (newer) instance 1886756708. Instance 921976706 remains unscathed. 3rd recycle - 1221835851 is born, 921976706 disappears without a sound. 4th recycle - having lost 921976706, only 1221835851 remains; 1930822523 enters the scene. For quite a while (17*5s) 1221835851 and 1930822523 coexist, until 1221835851 dies another sudden (ungraceful) death.
In my various attempts, I've seen the above patterns mix and match, with no logic I could find in their order (I'm pretty sure I did also see the right thing happening, but I'm not 100% sure). What seems to be regular is that after recycle 4, the same pattern seem to stabilise (coexistence and sudden death of the older instance).
My (informed by ignorance) explanation: this system isn't working and I can't use it for anything. Otherwise: do you have a possible explanation of what I'm doing wrong and/or how I could try to find out what's wrong? Thanks!
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.