Particular / NServiceBus

Build, version, and monitor better microservices with the most powerful service platform for .NET
https://particular.net/nservicebus/
Other
2.09k stars 648 forks source link

FatalExecutionEngineError debugging and FailFast #1137

Closed johnsimons closed 11 years ago

johnsimons commented 11 years ago

This issue was raised on the mailing list, see http://tech.groups.yahoo.com/group/nservicebus/message/18755

The problem is that if there is a configuration/startup exception in the NServiceBus host, the error/stacktrace of the root exception is not shown.

The code that is problematic is this (copied here for convenience):

try
{
    PerformConfiguration();

    bus = Configure.Instance.CreateBus();
    if (bus != null && !SettingsHolder.Get<bool>("Endpoint.SendOnly"))
    {
        bus.Start();
    }

    configManager.Startup();
    wcfManager.Startup();
}
catch (Exception ex)
{
    //we log the error here in order to avoid issues with non serializable exceptions
    //going across the appdomain back to topshelf
     LogManager.GetLogger(typeof(GenericHost)).Fatal("Exception when starting endpoint.", ex);
     Environment.FailFast("Exception when starting endpoint.", ex);
}

The problem is we assume that LogManager has already been configured but that may not be correct because for example:

public class LoggingConfiguration : IConfigureLoggingForProfile<Production>
{
    public void Configure(IConfigureThisEndpoint specifier)
    {
        SetLoggingLibrary.NLog();

        throw new Exception("Foo boo aaa...");

        NLog.Config.SimpleConfigurator.ConfigureForTargetLogging(new NLog.Targets.ColoredConsoleTarget());
    }
}

What would be the state of the logger?

To address this issue I recommend we do a Console.Out as well. So the code would end up like:

catch (Exception ex)
{
    //we log the error here in order to avoid issues with non serializable exceptions
    //going across the appdomain back to topshelf
    LogManager.GetLogger(typeof(GenericHost)).Fatal("Exception when starting endpoint.", ex);

    Console.Out.WriteLine("Exception when starting endpoint: {0}", ex);

    if (Environment.UserInteractive)
    {
        Thread.Sleep(10000); // so that user can see on their screen the problem
    }

    Environment.FailFast("Exception when starting endpoint.", ex);
}

The Environment.FailFast is still required because it logs to the Event Viewer as a backup in case neither of the above worked.

SimonCropp commented 11 years ago

can we just clear up one point to avoid confusion

The problem is that if there is a configuration/startup exception in the NServiceBus host, the error/stacktrace of the root exception is not shown.

the error+stacktrace of the root exception is not shown in the configured log because of an error in config. However the error+stacktrace does show up in the event log

jerrosenberg commented 11 years ago

@SimonCropp Apparently not when debugging in the IDE. I get no entry at all, let alone a stack trace.

When running without debugging, I do get 3 entries in the log, but the original stack is lost. Perhaps it's in the crash dump, but not going there :) Here are the entries:

  1. Error Application: NServiceBus.Host.exe Framework Version: v4.0.30319 Description: The application requested process termination through System.Environment.FailFast(string message). Message: Exception when starting endpoint. Stack: at System.Environment.FailFast(System.String, System.Exception) at NServiceBus.Hosting.GenericHost.Start() at Magnum.StateMachine.EventActionList1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Execute(System.__Canon, Magnum.StateMachine.Event, System.Object) at Magnum.StateMachine.State1[[System.Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].RaiseEvent(System.Canon, Magnum.StateMachine.BasicEvent1<System.__Canon>, System.Object) at Magnum.StateMachine.StateMachine1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].RaiseEvent(Magnum.StateMachine.Event) at Topshelf.Internal.ServiceControllerProxy.Start() at Topshelf.Internal.ServiceCoordinator.Start() at Topshelf.Internal.Hosts.ConsoleHost.Run() at NServiceBus.Hosting.Windows.Program.Main(System.String[])
  2. Error Faulting application name: NServiceBus.Host.exe, version: 4.0.0.2, time stamp: 0x5177f8a4 Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000 Exception code: 0x80131623 Fault offset: 0x000007f842e1aa78 Faulting process id: 0x73c0 Faulting application start time: 0x01ce44f308f4be38 Faulting application path: C:\projects.net\Alerts\Mailer\bin\Debug\NServiceBus.Host.exe Faulting module path: unknown Report Id: 477ab846-b0e6-11e2-be80-6805ca08681d Faulting package full name: Faulting package-relative application ID:
  3. Information Fault bucket , type 0 Event Name: CLR20r3 Response: Not available Cab Id: 0      Problem signature: P1: nservicebus.host.exe P2: 4.0.0.2 P3: 5177f8a4 P4: ema.Alerts.Tracing P5: 1.0.0.0 P6: 517e98db P7: 1 P8: b P9: System.NullReferenceException P10:      Attached files:      These files may be available here: C:\ProgramData\Microsoft\Windows\WER\ReportArchive\AppCrash_nservicebus.host_72a4b9949c8a957786a2a501e8ce07654f3dd98_672b10db      Analysis symbol: Rechecking for solution: 0 Report Id: 477ab846-b0e6-11e2-be80-6805ca08681d Report Status: 2048 Hashed bucket:
johnsimons commented 11 years ago

@jerrosenberg you not alone there! I have exactly the same problem, no stacktrace in event log! And I'm 100% sure this worked before! I wonder if is another v4.5 bug (I mean feature!)

SimonCropp commented 11 years ago

@jerrosenberg i stand corrected. With the debugger i have noticed some qwerks in what gets loged in the eventlog

jerrosenberg commented 11 years ago

I think there's 2 issues here.

  1. Failures while initializing logging are confuzzled by the debugger.
  2. FailFast seems pretty evil :)

I think your solution to 1 is a good one.

Here's my opinion on why 2:

If you really want this just for the event log write, then just write to the event log, and shut down gracefully, or at least not SO hard. :)

If you're worried about the event log write failing, you could always try catch that and failfast if you can't write to the log! Then I think you would definitely meet the criteria in the msdn doc:

Use the FailFast method instead of the Exit method to terminate your application if the state of your application is damaged beyond repair...

I think this methodology should be applied across the board wherever FailFast is currently used.

But I've been wrong before..

My cents.

Btw, you may enjoy this comment found here:

However, in CLR v4, Environment.FailFast when a debugger is attached gives you an MDA saying you've hit a bug in the runtime or unsafe managed code, and this is most likely caused by heap corruption or a stack imbalance from COM Interop or P/Invoke. That extremely misleading error isn't right, and we can temporarily work around this by using Environment.Exit if a debugger is attached. The right fix is to plumb FailFast correctly through our native Watson code, adding in a TypeOfReportedError for fatal managed errors.

SimonCropp commented 11 years ago

note: perhaps try catch from the code that calls topshelf and re-throw a new aggregate exception to preserve stack trace

SimonCropp commented 11 years ago

@johnsimons @andreasohlund proposed fix here https://github.com/NServiceBus/NServiceBus/commit/d1c2203442e44da742f6e55480ef668319c1b85a

the Idea is to

thoughts?

johnsimons commented 11 years ago

How do u find such obscure Apis? It feels like voodoo!

The only thing I don't like is the public extension method on the Exception class that we do not own :(

On Tuesday, June 18, 2013, Simon Cropp wrote:

@johnsimons https://github.com/johnsimons @andreasohlundhttps://github.com/andreasohlundproposed fix here d1c2203https://github.com/NServiceBus/NServiceBus/commit/d1c2203442e44da742f6e55480ef668319c1b85a

the Idea is to

  • catch the topshelf exception
  • extract the inner exception
  • patch the stack trace of the inner exception
  • rethrow the inner exception
  • let the runtime handle it as an unhandled exception

thoughts?

— Reply to this email directly or view it on GitHubhttps://github.com/NServiceBus/NServiceBus/issues/1137#issuecomment-19605989 .

Regards John Simons NServiceBus

SimonCropp commented 11 years ago

Fair point. I will make it internal.

As for the voodoo... There are other ways of doing it but they involve reflection.

Also we can remove the voodoo when we move to 4.5 since there is a simplified public api in 4.5

johnsimons commented 11 years ago

So what does the exception look like? Can u show a before and after?

On Tuesday, June 18, 2013, Simon Cropp wrote:

Fair point. I will make it internal.

As for the voodoo... There are other ways of doing it but they involve reflection.

Also we can remove the voodoo when we move to 4.5 since there is a simplified public api in 4.5

— Reply to this email directly or view it on GitHubhttps://github.com/NServiceBus/NServiceBus/issues/1137#issuecomment-19607530 .

Regards John Simons NServiceBus

SimonCropp commented 11 years ago

Have a look at the unit test I added

andreasohlund commented 11 years ago

Looks good, add a comment /YT to get rid of it when we are 4.5 only

On Tue, Jun 18, 2013 at 2:32 PM, Simon Cropp notifications@github.comwrote:

Have a look at the unit test I added

— Reply to this email directly or view it on GitHubhttps://github.com/NServiceBus/NServiceBus/issues/1137#issuecomment-19607984 .

http://andreasohlund.net http://twitter.com/andreasohlund

johnsimons commented 11 years ago

Agree, I'll merge it in today

On Wednesday, June 19, 2013, Andreas Öhlund wrote:

Looks good, add a comment /YT to get rid of it when we are 4.5 only

On Tue, Jun 18, 2013 at 2:32 PM, Simon Cropp <notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.com');>>wrote:

Have a look at the unit test I added

— Reply to this email directly or view it on GitHub< https://github.com/NServiceBus/NServiceBus/issues/1137#issuecomment-19607984>

.

http://andreasohlund.net http://twitter.com/andreasohlund

— Reply to this email directly or view it on GitHubhttps://github.com/NServiceBus/NServiceBus/issues/1137#issuecomment-19638195 .

Regards John Simons NServiceBus