bitwiseworks / libc

LIBC Next (kLIBC fork)
9 stars 4 forks source link

Make LIBC raise a breakpoint exception instead of process dump #96

Closed dmik closed 3 years ago

dmik commented 3 years ago

By default, the LIBC panic routine (called when critical runtime assertions are failed) tries to create a process dump with DosDumpProcess and then terminates the process with exit code 127. This behavior can be altered with the LIBC_PANIC environment variable which can be set to [nodump][terse][quiet][breakpoint] (where nodump disables dump generation, terse makes it less verbose, quiet makes it completely quiet and breakpoint will execute the INT 3 x86 instruction to raise a breakpoint exception rather than terminate the process normally).

The default behavior of dump generation is not very useful for end users. Most of them won't have dumps enabled at all and even if they do enable them on purpose, it requires quite a bit of knowledge to interpret this dump (so not very useful again). Given that we have EXCEPTQ enabled in to virtually any app via LIBCx (-lcx), it looks much more useful to let the LIBC panic handler generate a breakpoint by default. This way, the user will get a .TRP file which just plain text and is much easier to read than the process dump and can be easily sent to the app developer. And even if LIBCx is not used, the user will get at least a POPUPLOG.OS2 entry. Currently, if LIBC_PANIC is not set and process dumps are disabled, the user will just get a message from LIBC to the console which isn't very useful and can be easily missed by the user so the reason of termination is often not known to them and no record of it is left anywhere.

Note that the LIBC panic routine is also what is called by default for many signals (such as SIGKILL or SIGABRT) if the app doesn't provide its own signal handler. Having an EXCEPTQ .TRP file in this case is also much more useful than the process dump.

There is also a standard C abort call (used in many apps to signal about failed assertions). This call raises SIGABRT which ends up in the LIBC panic handler as well. This behavior can also be overridden with an environment variable: set LIBC_BREAKPOINT_ABORT=1 will cause abort to generate an INT 3 breakpoint too instead of raising SIGABRT. Again, it is much more useful if .TRP is generated by abort rather than the process dump. However, we won't need to do anything special here if we change the default LIBC panic behavior to raise a breakpoint: since abort will end up in the LIBC panic handler as well by default, it will cause .TRP file creation too.

dmik commented 3 years ago

While doing this, I realized that enabling breakpoints by default is not a good idea because this would lead to both EXCEPTQ .TRP and POPUPLOG.OS2 entries if EXCEPTQ is installed which would be very annoying. The LIBC panic handler is called virtually in every trap-like situation in a LIBC-based app:

  1. When the application crashes due to memory access violation or FPU exception (via SIGSEGV and SIGFPU, respectively, if they these signals are not handled by the app in a POSIX way), or any other OS/2 system exception (all of them are mapped to POSIX signals).
  2. When the application raises a POSIX signal via raise (or indirectly via abort etc) and this signal is also not handled by the app itself.
  3. When LIBC detects some critical condition (LIBC panic) and cannot continue.

In the first case, a real OS/2 exception occurs first which is handled by the exception chain in a normal way. And if EXCEPTQ is installed, it will kick in BEFORE the LIBC panic handler because the LIBC exception handler (that raises POSIX signals in response to system exceptions which end up in the LIBC panic handler by default) will be below it on the exception handler chain. In the second and third cases, the LIBC panic handler is called as a normal function (i.e. not while handling an exception) so no EXCEPTQ participation is there yet. And while I can detect EXCEPTQ presence in cases 2 and 3 (and therefore deal with the breakpoint to avoid both .TRP and POPUPLOG.OS2 records), I cannot do that in case 1. I wonder if @StevenLevine knows a way to get the original exception chain while in the exception handler — in order to detect if EXCEPTQ has been already called or not. If I could now that I would be able to conditionally raise a breakpoint depending on whether EXCEPTQ already generated something or not.

For the reasons above I went a different way for now. Breakpoints are still disabled by default but I now trigger an EXCEPTQ debug exception if the EXCEPTQ handler is present on the exception chain. This way, we will get usual .TRP in case 1 (where EXCEPTQ kicks in first) and "debug" .TRP in cases 2 and 3. They look really neat, in case of signals (e.g. SIGABORT via abort), like that:


 Exception Report - created 2021/02/13 14:14:45
______________________________________________________________________

 LIBC: Killed by SIGABRT

 OS2/eCS Version:  2.45
 # of Processors:  4
 Physical Memory:  3260 mb
 Virt Addr Limit:  3072 mb
 Exceptq Version:  7.11.3-shl (Jul  5 2016)

______________________________________________________________________

 Exception 71785158 - Exceptq Debug Request
______________________________________________________________________

In case of LIBC panic, like that:

______________________________________________________________________

 Exception Report - created 2021/02/13 14:13:35
______________________________________________________________________

 LIBC PANIC!! fmutex deadlock: Recursive mutex!

0x00020050: Owner=0xea280001 Se

 OS2/eCS Version:  2.45
 # of Processors:  4
 Physical Memory:  3260 mb
 Virt Addr Limit:  3072 mb
 Exceptq Version:  7.11.3-shl (Jul  5 2016)

______________________________________________________________________

 Exception 71785158 - Exceptq Debug Request
______________________________________________________________________

And the normal stack trace follows. Much more useful than before (where no .TRP would be generated in cases 2 and 3 at all). However, in case of panic I faced an EXCEPTQ limitation on the length of the pszReportInfo string in SetExceptqOptions. It's jus 80 chars, so the panic message looks really truncated above, compare to the original:

LIBC PANIC!!
fmutex deadlock: Recursive mutex!
0x00020050: Owner=0xea280001 Self=0xea280001 fs=0x3 flags=0x0 hev=0x0001003b
            Desc="test.c"

Some vital information like the mutex description is missing from .TRP because I had to truncate it. So I call Steven for assistance here as well. I guess that calling SetExceptqOptions here (I need that not only for the report info message but also to enable the debug exception with D option in pszOptions) looks like an abuse. We have a plenty of fields in EXCEPTIONREPORTRECORD to control the EXCEPTQ debug exception parameters when raising it with DosRaiseException. E.g. the EXCEPTQ handler can use a custom flag in .fHandlerFlags to indicate that the report should be generated regardless of D option presence. And the message for the repot can be passed in .ExceptionAddress as a C string. This way, we won't need a call to SetExceptqOptions to cange options and the report info (which is meant static by default — e.g. for program version etc).

Also, there is a couple of bugs in pszReportInfo handling in EXCEPTQ. First, if the message is longer than 80 chars, it won't be truncated as the docs say, it will be just not printed. Also, if the message contains an \r\n sequence, it will lead to two EOLs which is not right (see above).

I also faced another EXCEPTQ bug while playing with it. There is a FORCEEXIT export that should cause a "forced trap". I would expect a .TRP generation there but it doesn't work this way. In fact, it causes the app to crash badly with an output like that (and no .TRP):

Exiting by exception
... 100 other copies of "Exiting by exception" skipped ...
Exiting by exception
SYS1808:
The process has stopped.  The software diagnostic
code (exception code) is  0005.
dmik commented 3 years ago

Going even further, I don't think that playing with exceptions is necessary here at all. Since LIBC already provides an exception handler which gets triggered at all needed times, the ONLY thing I need from EXCEPTQ in LIBC (and LIBCx) based apps is a simple export that will just generate a .TRP file. This export can look like that:

void APIENTRY GenerateTrapReport(EXCEPTIONREPORTRECORD* pExRepRec,
                                 EXCEPTIONREGISTRATIONRECORD* pExRegRec,
                                 CONTEXTRECORD* pCtxRec,
                                 const char* pszMessage);

Where the first three parameters match ones of a normal exception handler and the fourth parameter is a custom string of an arbitrary length put into .TRP after pszReportInfo (if set). I can make the first three parameters available in the LIBC panic handler (actually pCtxRec is already there) and just perform such a call from there. It also makes sense to accept NULL for the first three parameters in which case the report should be generated assuming GenerateTrapReport itself as a top of the call stack.

This way, we will NOT NEED to install EXCEPTQ exception handlers on every app's thread — neither in LIBC, nor in LIBCx or anywhere else. And yes, this also means that all LIBC (kLIBC or LIBCn) based applications will get .TRP report generaton FOR FREE — w/o a need to link them to LIBCx and w/o even a need to recompile them at all (provided that they use _beginthread to start threads, of course — but this is our current limitation as well).

I really hope that @StevenLevine can help with implementing the EXCEPTQ part of things here.

StevenLevine commented 3 years ago

Sorry for the slow reply. Lots of distractions here. Your commit pretty much matches what I was going to suggest.

There was never a requirement to use InstallExceptq() or your ScopedExceptqLoader class to install the exceptq exception handler. InstallExceptq() and ScopedExceptqLoader are convenient for applications that don't already implement a private exception handler. All that was ever required was for the application to arrange to call MYHANDLER from some exception handler. I have a number of applications that, like kLIBC, have private exception handlers and they call MYHANDER from the private exception handler.

Regarding your GenerateTrapReport request, let me think about how best to support this.

Currently, szMsg in your

b_panic.c:490 SetExceptqOptions("D", szMsg);

is ignored because the report info argument is ignored in this context.

SetExceptqOptions() states:

exceptq.c:404 /* info ('i') - app has provided an info string;

I'm not sure why Rich implemented this limitation. Seems to me that

SetExceptqOptions("DI", szMsg);

should not be prone to failing.

Also, Rich decided to ignore excessively long strings because:

/ If the string is too long, it may not be a string at all, so exit /

However, this is not why your report info did not display.

I will look at safely truncating the string because it will make usage errors easier to understand. For now the report info string is going to continue be limited to 80 characters because the buffer is static, for historical reasons.

In the fullness of time I plan to make exceptq fully SMP safe and the static buffers will go away.

BTW, I don't know about you, but I have always found kLIBC's default exception report less than useful. It omits some information that is in the standard popuplog. I would not complain if the default output of __libc_Back_panicV was easier to read/use.

dmik commented 3 years ago

Aha, so I can already just call MYHANDLER directly, good! I will try that. It should let me get rid of EXCEPTQ installations in LIBC/LIBCx. Regarding the version that does not require DosRaiseException, it could be enough I guess to just support NULL for EXCEPTIONREPORTRECORD and EXCEPTIONREGISTRATIONRECORD pointers in MY_HANDLER and take CONTEXTRECORD from DosQueryTHreadContext if it's NULL on input. And also treat the fourth PVOID argument as a message string to put in .TRP below the report line in case when the first two arguments are NULL — to stop abusing the short report info string for that at all. I guess it all is not too difficult to implement.

Re szMsg, in fact, it is not ignored as I see the string in the .TRP...

Regarding the kLIBC exception report — yes, my thoughts. That's why I finally decided to make it generate a .TRP.

BTW, I see that TT option makes the stack trace much less verbose but the .TRP still provides a lot of info. It would be useful to have an option to just put the stack trace and nothing more — usually that's all I need when debugging crashes. And it could help in tracing execution in general.

dmik commented 3 years ago

I've just realized that I can't use MYHANDLER w/o support for NULL for its arguments:__libc_Back_panic is often called in LIBC not from an exception handler but as a regular function (with pCtx set to NULL) so this data is not available there. But as I definitely (and desperately) need EXCEPTQ .TRP in those cases, I have to have the EXCEPTQ exception handler installed via LIBCx everywhere... (as in this case I can use DosRaiseException, i.e. what the current code does).

So getting rid of EXCEPTQ handler installation has to wait until @StevenLevine implements NULL support. I will create a new ticket and close this one.

StevenLevine commented 3 years ago

Regarding your statement that "szMsg, in fact, it is not ignored as I see the string in the .TRP", you are correct. The existing comments were confusing, at least to me. The comments have been updated to:

  /* info ('i') - app has provided an info string;
   * note:  this option only affects info string processing when
   * InitOptions() is called by InstallExceptq() and if we're
   * examining app-supplied options.
   * If InitOptions is called by SetExceptqOptions, the info string
   * will always be processed.
   */

which explains what really happens.

dmik commented 3 years ago

Yes, now the description seems to reflect what I see in case of SetExceptqOptions at least. Not sure about the rest as I haven't tried it.