Trapping SIGABRT, SIGBUS, and SIGILL is bogus

krader1961 commented 6 years ago

While fixing lint warnings I noticed that src/cmd/ksh93/sh/fault.c tries to handle signals which should be fatal. This came to my notice because of this block of code in sh_fault():

if (sig == SIGABRT || (abortsig(sig) && (ptr = malloc(1)))) {
    if (ptr) free(ptr);
    sh_done(shp, sig);
}

There is no justifiable reason to intercept those signals in a ksh script and allow them to be treated as "soft" errors that interrupt whatever ksh is doing rather than kill the process. What is even more dubious is predicating handling the signals other than SIGABRT on whether or not malloc(1) succeeds. Either the ksh process is able to deal with treating the signal as a "soft" error or it isn't. And that could be due to a huge number of variables other than whether malloc(1) succeeds.

Note that I deliberately excluded SIGSEGV from the subject line of this issue even though abortsig() returns true for that signal. We want to handle that signal by dumping a minimal function backtrace before dying. That mechanism is already in place.

krader1961 commented 6 years ago

Some people might argue it should be possible to trap these signals because someone may use the kill command or API to send the signals as a method to communicate something to the process. Yes, that is technically true. But absolutely no one should expect that to work. There are dedicated signals for that purpose (SIGUSR1, SIGUSR2) and many other signals (e.g., SIGHUP) that can be utilized. The reason signals like SIGABRT can be trapped is to allow the process maximal flexibility to report and/or cleanup before exiting. Which does not apply to a shell script. Yes, you can argue that removing temp files, for example, should be possible when such signals are received. I call bullshit on such arguments.

mikeserv commented 6 years ago

in my opinion, the script should optionally generate files if given the chance. for example, given that ksh93 can -stream its source script and lseek same (though i forget the #< redirect operator specifically), it is entirely likely that the script running at signal time is both not the script invoked or the script at exit. for this reason, the shell programmer should have the option to read out execution level input if signalled, whatever the signal may be.

also bear in mind that the ksh93 shell combines many other standard utilities besides, and supplies means of runtime builtin extensions via linked libraries... there are plenty of cases for any of these to require room to ABRT.

krader1961 commented 6 years ago

I can't make sense of your comment, @mikeserv. Do you understand the intended purpose of SIGABRT? That signal is generated in response to calling abort() (or assert() which calls it) and is meant to signal to the kernel that the process should be terminated and a core dump generated if possible. The code only calls abort() when the state of the process is so screwed up that continuing to run would be dangerous. There is absolutely nothing a ksh script can do that would be safe or likely to be helpful if it did trap 'do_something' ABRT. This has nothing to do with a bug in the ksh script being run.

mikeserv commented 6 years ago

ksh93 is designed to operate, as needed, at a system kernel level. the programming interface should give the programmer base operating system independence, which is also the intended application of the posix standard. sfio, ast, 3dfs, and ksh together package a kernel capable virtual machine that depends almost not at all on underlying hardware/software restrictions. what you do with your fork is your business, i guess, but thats the point, yo.

krader1961 commented 6 years ago

Unassigning myself since I think this needs several months of discussion before changing the behavior. Having said that I still can't make sense of @mikeserv's comments. Obviously English isn't their native language but I have no idea what phrases like at a system kernel level mean. And they certainly don't understand what virtual machine means. But that doesn't mean there are no good arguments for continuing to allow ksh scripts to trap SIGABRT, SIGILL and SIGBUS. I just can't imagine what those arguments might be.

McDutchie commented 6 years ago

Every single Bourne-derived shell in existence currently allows trapping SIGABRT, SIGBUS and SIGILL.

POSIX specifies that

Setting a trap for SIGKILL or SIGSTOP produces undefined results.

…but the results of trapping all other POSIX-defined signals is defined, so if POSIX compliance is the policy, they must be trappable.

I would also think that it is not up to the authors of the shell to decide which use cases are justified and which aren't. It's a central part of the Unix philosophy to allow the programmer the freedom to shoot themself in the foot.

krader1961 commented 6 years ago

@McDutchie I don't understand your comment. The POSIX spec you linked to explicitly says that trapping some signals produces undefined results. The fact that trapping other signals like SIGTERM or SIGUSR1 has well defined behavior and is useful for shell scripts to do does not imply that the shell should let the user do something nonsensical like trapping SIGABRT. Have you ever seen a script trap the signals in question? To what purpose? If ksh has called abort() (the usual means by which SIGABRT gets delivered) what exactly is a ksh script going to do? My proposed change simply maximizes the likelihood of getting useful information. For example, a core dump in the case of SIGABRT() or stack backtrace and core dump in the case of SIGSEGV. While making it harder for the script that triggered those signals to do additional damage by continuing execution when the state of the ksh process is invalid.

McDutchie commented 6 years ago

The POSIX spec you linked to explicitly says that trapping some signals produces undefined results.

No. It specifically says that trapping SIGKILL or SIGSTOP produces undefined results. It does not say that trapping any other signals produces undefined results.

att / ast

Trapping SIGABRT, SIGBUS, and SIGILL is bogus #814