bitwiseworks / libc

LIBC Next (kLIBC fork)
9 stars 4 forks source link

Improve EXCEPTQ trap report generation in __libc_Back_panic #98

Closed dmik closed 3 years ago

dmik commented 3 years ago

This is a spin off of https://github.com/bitwiseworks/libc/issues/96#issuecomment-778613043.

Currently, trap report generation requires an EXCEPTQ exception handler to be installed on all app's threads (which is normally done by linking to LIBCx which overrides main and _beginthread to do so). However, installing an exception handler for just that is not necessary per se and in fact is an abuse of the exception handler mechanism given that LIBC has its own exception handler already and that we also need those reports outside the context of exception handling at all (i.e. in a default handler of user-generated POSIX signals like SIGABRT or in own LIBC assertions where __libc_Back_panic is called with NULL as pCtx).

The EXCEPTQ DLL already exports MYHANDLER, a trap report generation workhorse. But that one needs real EXCEPTIONREPORTRECORD, EXCEPTIONREGISTRATIONRECORD and CONTEXTRECORD pointers which we don't have outside the exception handler context.

I've already made a feature request to @StevenLevine to support NULL for those parameters. So it's currently a showstopper for this ticket.

I also asked Steven to support passing an additional zero-terminated C string of an arbitrary length via a fourth parameter of MYHANDLER and put that string into the generated .TRP file for more informativeness. The idea is to put __libc_Back_panic's user message into this string as it usually contains a very useful information about the trap/assertion reason.

For now, .TRP are already generated in all mentioned situations (done within #96) but they require the EXCEPTQ handler to be installed and the user message length is limited to 80 chars because we abuse the EXCEPTQ "report info" string for that which is limited like that as it is intended for the app/library version information. Once Steven implements that, we will indeed put the LIBC version information in that string instead and the panic message will be passed as a fourth parameter of MYHANDLER.

StevenLevine commented 3 years ago

Some preliminary thoughts...

This is not quite true. Your code can build suitable versions of the required structures, which is relatively easy. For example, CONTEXTRECORD.ContextFlags controls what data is supplied in CONTEXTRECORD. If there is no useful data, you can skip filling in the CONTEXT_FLOATING_POINT or CONTEXT_INTEGER data. You are going to need to supply CONTEXT_CONTROL data or exceptq will not be able to generate stack traces an such.

The EXCEPTIONREGISTRATIONRECORD pointers can be NULL.

Your code needs to fill in the EXCEPTIONREPORTRECORD. You can use ExceptionInfo[] to pass additional info if that is helpful.

At this point, I am not convinced that adding an additional entry point to exceptq to implement this code is the best solution. The LIBC panic handler is also going to have better knowledge of what needs to be provided. To much existing code depends on the behavior of the existing MYHANDLER, so any new features would need to be via a MYHANDLER_EX entry point.

FWIW, you can pass a pointer to your special messages in one of the registers. This assumes that the register content is not otherwise needed for analysis.

dmik commented 3 years ago

Note that I didn't say I need a new export. I only need supporting NULL in the first three parameters and a string in the fourth parameter in the existing MYHANDLER export. I don't see how this could break backward compatibility provided that no existing code should ever call MYHANDLER as such. And it seems easier to me than building up EXCEPTIONREPORTRECORD and CONTEXTRECORD in each app that needs to call MYHANDLER directly and not from the exception handler. But OK, I may also do it myself from LIBC for now since I basically need it only in once place.

Regarding the message. Yes, when EXCEPTIONREPORTRECORD is constructed by the caller (and I actually have to do that for DosRaiseException already), passing the custom message string in ExceptionInfo sounds like the best option. But does MYHANDLER already support treating it as a message string in response to EXCEPTQ_DEBUG_EXCEPTION? If so, that would be perfect. If not yet, that would be perfect to have :)

I will recall what I need the string for. Right now, with abusing a 80 chars report info string, I have this in .TRP:

 LIBC PANIC!! fmutex deadlock: Recursive mutex!

0x00020050: Owner=0xea280001 Se

i.e. the LIBC panic message is truncated. And I want something like this:

LIBCn version 0.1.6 / LIBCx version 0.6.9

LIBC PANIC!!
fmutex deadlock: Recursive mutex!
0x00020050: Owner=0xea280001 Self=0xea280001 fs=0x3 flags=0x0 hev=0x0001003b
            Desc="test.c"

Note that the first line is that report info string set by the D option and the rest is what I suggest to pass either in ExceptionInfo or as a fourth argument.

The idea of abusing registers for the string I really don't like.

StevenLevine commented 3 years ago

Passing the string pointer in ExceptionInfo is definitely a better idea that abusing some poor register. Abusing a registers was just the best idea I had a the moment.

I'm not going to modify exceptq to abuse the existing MYHANDLER parameters. The existing code assumes that the parameters are passed as defined by kernel. Also, MYHANDLER is far from re-entrant. If you were to call MYHANDLER directly, there is always the possibility that the kernel will call it concurrently.

Since I don't know who might already be using the EXCEPTQ_DEBUG_EXCEPTION and how they might be using the ExceptionInfo array, I will define an EXCEPTQ_DEBUG_EXCEPTION_EX that will allow up to 4 optional pointers to be passed in the array.

ExceptionInfo[0] if not NULL, will point to a string that will print in addition to the existing ReportInfo string.

If you think it will be useful for exceptions you raise, the other ExceptqInfo array entries, if provided can point to memory areas that will be printed in a format similar to the other memory dumps. For example:

Memory addressed by ExceptionInfo[1] (00062368) for 184 bytes

This is a feature that I've often wished I had at times. The registers do not always point to all the relevant memory areas of interest. I typically fall back to a memory dump when this happens, but a .trp file is generally easier to use. It's not possible to implement this easily for exceptions generated by the kernel, but it is relatively easy to implement for EXCEPTQ_DEBUG_EXCEPTION_EX, if you think it will be useful. Let me know if you want this implemented.

Your code will fill the EXCEPTIONREPORTRECORD and call DosRaiseException.

dmik commented 3 years ago

Ok, with some assembly I managed the direct MYHANDLER call approach work and now I get a .TRP report in all cases w/o even needing to install EXCEPTQ handler per-thread! Good. When I call MYHANDLER from a LIBC exception handler (where we have all three exception handler arguments non-NULL), all looks exactly like it was with per-thread EXCEPTQ handler installation.

However, when I construct EXCEPTIONREPORTRECORD by hand, provide a partial CONTEXTRECORD with CONTEXT_CONTROL | CONTEXT_SEGMENTS (and a NULL EXCEPTIONREGISTRATIONRECORD), there are some glitches. This is how it looks now:

______________________________________________________________________

 Exception Report - created 2021/02/23 01:13:34
______________________________________________________________________

 LIBC: Killed by SIGABRT

 Hostname:         novator
 OS2/eCS Version:  2.45
 # of Processors:  4
 Physical Memory:  3260 mb
 Virt Addr Limit:  3072 mb
 Exceptq Version:  7.11.5-shl BETA8 (Jun  1 2020 18:37:02)

______________________________________________________________________

 Exception 71785158 - Exceptq Debug Request
______________________________________________________________________

 Process:  D:\CODING\TESTS\LIBC\FMUTEX\TEST.EXE (02/23/2021 01:10:29 5,756)
 PID:      D8BE (55486)
 TID:      01 (1)
 Priority: 200

 Filename: N/A
 Cause:    Invalid execution address 00000000

______________________________________________________________________

 Registers
______________________________________________________________________

 ESP : 0012FA40   EBP  : 0012FC08   EIP : 1D2FA080   EFLG : 00000246
 CS  : 005B       CSLIM: FFFFFFFF   SS  : 0053       SSLIM: FFFFFFFF

 EAX : not a valid address
 EBX : not a valid address
 ECX : not a valid address
 EDX : not a valid address
 ESI : not a valid address
 EDI : not a valid address

______________________________________________________________________

 Stack Info for Thread 01
______________________________________________________________________

   Size       Base        ESP         Max         Top
 00100000   00130000 -> 0012FA40 -> 0012D000 -> 00030000

______________________________________________________________________

 Call Stack
______________________________________________________________________

   EBP     Address    Module     Obj:Offset    Nearest Public Symbol
 --------  ---------  --------  -------------  -----------------------
 Trap  ->  1D2FA080   LIBCN0    0001:000FA080  b_panic.c#547 ___libc_Back_panicV + 15B4 0001:000F8ACC (b_panic.obj)
... 

First, There is no Filename and Cause is weird. This is because I pass NULL as XcptRepRec.ExceptionAddress. I can put the current EIP there and then it starts looking like this:

______________________________________________________________________

 Exception 71785158 - Exceptq Debug Request
______________________________________________________________________

 Process:  D:\CODING\TESTS\LIBC\FMUTEX\TEST.EXE (02/22/2021 22:46:22 4,733)
 PID:      CCDE (52446)
 TID:      01 (1)
 Priority: 200

 Filename: D:\CODING\LIBC\MASTER-BUILD\OPT\OMF\LIBCN0.DLL (02/22/2021 23:01:04 3,620,291)
 Address:  005B:1D2F9113 (0001:000F9113)
 Cause:    Program requested an Exceptq debug report

______________________________________________________________________

 Failing Instruction
______________________________________________________________________

 1D2F9107  MOV   WORD [EBP-0x54], FS           (8c65 ac)
 1D2F910A  MOV   WORD [EBP-0x50], ES           (8c45 b0)
 1D2F910D  MOV   WORD [EBP-0x4c], DS           (8c5d b4)
 1D2F9110  MOV   [EBP-0x30], EBP               (896d d0)
 1D2F9113 >MOV   DWORD [EBP-0x2c], 0x1d2f9113  (c745 d4 13912f1d)
 1D2F911A  MOV   WORD [EBP-0x28], CS           (8c4d d8)
 1D2F911D  PUSHF                               (9c)
 1D2F911E  POP   DWORD [EBP-0x24]              (8f45 dc)

______________________________________________________________________

I.e. it refers to LIBCN0.DLL now and provides disassembly. However, this disassembly is dumb — it is always my code composing CONTEXTRECORD from the current register values, pretty useless for debugging. I prefer to not have disassembly in such a case. I guess EXCEPTQ should just accept ExceptionAddress being NULL when ExceptionNum is EXCEPTQ_DEBUG_EXCEPTION or EXCEPTQ_DEBUG_EXCEPTION_EX and don't bitch about a missing Filename and an invalid execution address (i.e. just omit the block containing the Filename and Cause lines).

Second, there is a block of integer registers (carrying not a valid address since they are NULL in CONTEXTRECORD) even though CONTEXT_INTEGER is not passed in ContextFlags. I guess EXCEPTQ should skip this block if no CONTEXT_INTEGER is present.

dmik commented 3 years ago

@StevenLevine regarding your questions. Yes, the EXCEPTQ_DEBUG_EXCEPTION_EX solution looks totally fine for me.

ExceptionInfo[0] if not NULL, will point to a string that will print in addition to the existing ReportInfo string.

More than enough! Just don't put a length limit on it please, so that LIBC panic messages would fit.

If you think it will be useful for exceptions you raise, the other ExceptqInfo array entries, if provided can point to memory areas that will be printed in a format similar to the other memory dumps.

Yes, this really makes sense. I don't have a current use case for it but I surely might. LIBC/LIBCx has a lot of info one could have a sneak peek at in case of various fatalities. So please do that.

What I would also like to have (I already mentioned this somewhere) is a way to generate a very short trap report consisting basically of the call stack trace and nothing more (useful for tracing function entry points etc). Something like that:


______________________________________________________________________

 Exception Report - created 2021/02/23 01:15:51
______________________________________________________________________

 LIBC PANIC!! fmutex deadlock: Recursive mutex!

0x00020050: Owner=0xd8db0001 Se

 Hostname:         novator
 OS2/eCS Version:  2.45
 # of Processors:  4
 Physical Memory:  3260 mb
 Virt Addr Limit:  3072 mb
 Exceptq Version:  7.11.5-shl BETA8 (Jun  1 2020 18:37:02)

______________________________________________________________________

 Exception 71785158 - Exceptq Debug Request
______________________________________________________________________

 Process:  D:\CODING\TESTS\LIBC\FMUTEX\TEST.EXE (02/23/2021 01:15:24 7,073)
 PID:      D8DB (55515)
 TID:      01 (1)
 Priority: 200

______________________________________________________________________

 Call Stack
______________________________________________________________________

   EBP     Address    Module     Obj:Offset    Nearest Public Symbol
 --------  ---------  --------  -------------  -----------------------
 Trap  ->  1D2FA080   LIBCN0    0001:000FA080  b_panic.c#547 ___libc_Back_panicV + 15B4 0001:000F8ACC (b_panic.obj)
 0012FDD8  1D2FA60E   LIBCN0    0001:000FA60E  b_panic.c#151 ___libc_Back_panic + 26 0001:000FA5E8 (b_panic.obj)
 0012FDF8  1D20A92F   LIBCN0    0001:0000A92F  fmutex.c#381 ___fmutex_request_internal + 3DF 0001:0000A550 (D:\Coding\libc\master\src\emx\src\lib\process\fmutex.c)
 0012FE68  000100F6   TEST      0001:000000F6  fmutex.h#78 _main - 9E 0001:00000194 (test.c)
 0012FE98  0001025A   TEST      0001:0000025A  test.c#63 main + C6 0001:00000194 (test.c)
 0012FF60  00010027   TEST      0001:00000027  crt0.s#90 __text + 27 0001:00000000 (D:\Temp\cc4GANW3.s)
 0012FF84  1D1EF751   LIBCX0    0001:0000F751   ___init_app + 11 0001:0000F740 (main.obj)
 0012FFE0  1D2339CB   LIBCN0    0001:000339CB  appinit.s#16 ___init_app + B 0001:000339C0 (appinit.obj)

______________________________________________________________________

 End of Exception Report - report took 155 ms to generate
______________________________________________________________________

Given that EXCEPTQ is already perfectly capable of appending to existing .TRP files, this could create a nice tracing report even for very frequent entry points.

This may be configured using a new option letter for SetExceptqOptions (and EXCEPTQ env. var). For instance:

dmik commented 3 years ago

And also something I already requested several times: a single path for all .TRP files (set via e.g. set EXCEPTQ_PATH=C:\var\log\exceptq). This is not always useful for the developer but it is extremely useful for end user's systems. And for the developer system as well — for apps which the developer currently is not developing (and for the app being developed, the developer can always reset EXCEPTQ_PATH to have .TRP . in the current directory, as it is now).

Note that this will require changes to the .TRP file name. Currently it is PPPP_TT.TRP where PPPP is the hex PID and TT is the TID. While it should be something like EXENAME-TIMESTAMP-PPPP_TT.TRP where EXENAME is the name of the executable and TIMESTAMP is the value of QSV_TIME_LOW. This is to make sure that .TRP reports are not overwritten because of PID clashes after reboots or such (note that we don't need SV_TIME_HIGH there because it's very unlikely that logs will be stored for THAT long :)

This is how LIBCx log file names in C:/var/log/libcx look as an example (all threads go to a single file so there is no TID):

QTWEBENGINEPROCESS-602c4fb5-a60a.log
TEE-602c4fb6-a604.log

Also note that if you do that, EXCEPTQ should get the timestamp once for a given process (upon the first exception) and then use it for all further exceptions in that process on all of its threads. It is important both for proper grouping and for when multiple recoverable exceptions on the same thread occur (e.g. via EXCEPTQ_DEBUG_EXCEPTION) since they should end up in the same .TRP file.