Catching *nix signals (segfault in particular)?

TheDan64 commented 6 years ago

As part of supporting multiple versions of LLVM in inkwell, I ran into a case where perfectly valid input to a certain function for non existent data (similar to looking up a key that doesn't exist in a hash table) would cause a segfault in a small subset of versions of LLVM. This is presumably a LLVM bug since it works fine in previous versions.

Thinking about this, I'm wondering: How terrible of an idea is it to set up a signal handler(IE via the signal crate) to catch that segfault and return the error case that I would have normally returned in LLVM versions that don't segfault?

I'm thinking at best the segfault could just be a null pointer dereference, but in the worst case it could have done something crazy like corrupted the whole stack...

Is this a crazy idea? What if I first look at the LLVM source code and am able to verify that the bug(maybe there's a fix I can look at in a newer version)'s segfault is relatively harmless in each offending version (ie null ptr deref or reading (but not writing) to invalid memory?).

Michael-F-Bryan commented 6 years ago

I believe the common belief is that you shouldn't try to "catch" a segfault or mask it in any way. The way I've heard it described, segfaults are just another way to find out there's a programming bug somewhere so the correct thing to do would be to crash loudly so someone can fix the bug.

That said, because you want to still support the segfaulting versions you'll probably want to do something so users don't encounter segfaults during normal use. I can think of a couple ways to deal with the issue while still supporting the offending versions, listed from least to most hacky:

Use #[cfg] so whenever you try to invoke the function from an offending version it'll always return an error (or panic) saying "this function is unavailable due to a bug in this version of LLVM"
Try to detect input which may result in segfaults before the LLVM function is invoked and return an error
Write a catch_segfault() wrapper function that'll invoke a closure and try to catch any segfaults, transforming them into errors. I'm not sure how you'd do this and it feels super hacky.

EDIT: I did a little googling and came across this C++ thread. I think one of their comments summarizes things quite well.

You can't catch segfaults. Segfaults lead to undefined behavior - period (err, actually segfaults are the result of operations also leading to undefined behavior. Anyways, if you got a segfault, you also got undefined behavior invoked, so it doesn't really matter...). And the OS takes control from your program ASAP, which actually is a Good Thing.

TheDan64 commented 6 years ago

Thanks for looking into it! I've thought about those ideas too, and I think the problems are as follows:

cfg wrapper - the function in question is a really useful one and I feel like the library is far less useful without it for the offending versions (again, it does still work in the case of the input representing something that exists) and the offending versions are in the later versions, not the older ones so it's more likely to be missed than not
checking input - the input is just a string, so its hard if not impossible to validate (again, like a key in a hash table that may or may not be there but checking to see if not there segfaults)

re: C++ thread: The way I understood it, "You can't catch segfaults" is in reference to the try/catch mechanics of C++. You can't catch it with that because it's not a C++ exception. But you can totally set up a signal handler for SIGSEV and "catch" it that way, which is what the signal library does but in rust. The point about segfaults always being UB is probably a good point, though.

I guess there's just no good solution...

Michael-F-Bryan commented 6 years ago

But you can totally set up a signal handler for SIGSEV and "catch" it that way, which is what the signal library does but in rust.

Signal handlers are just callbacks that get fired when that particular signal is received, so how do you plan to resume code flow? As far as I can tell, you'd need to stash away a return pointer and then jump to it in the signal handler if a segfault was encountered. I guess you could always execute the thing in another thread and use a "segfault encountered" flag to return the error, but I'm not sure if that'll work because LLVM isn't Send or Sync.

TheDan64 commented 6 years ago

So I was able to work around this issue for the LLVM versions in question by calling a similar function that took the exact same input params and early returned if it returned an Err, discarding an Ok. I was lucky that other function existed

Michael-F-Bryan / rust-ffi-guide

Catching *nix signals (segfault in particular)? #63