Open roblabla opened 5 years ago
This happens because the caller address is outside the text region we find. The ld find_cfi method calculates the text region by looking at the first PT_LOAD. Unfortunately, in this binary's case, the first PT_LOAD is not the executable, but a read-only bit. So the text region we find is actually not the .text, but some other segment.
I believe we should use the flags to find the executable section.
Fixed the above problem with #23. Unfortunately, it's not enough. I end up executing an ud2, causing an illegal instruction exception. Freaky. Here's the backtrace from gdb:
(gdb) bt
#0 _Unwind_RaiseException (exception=0x555555894db0) at unwind/src/libunwind_shim.rs:111
#1 0x000055555579fa59 in rust_panic () at src/libstd/panicking.rs:527
#2 0x000055555579fa32 in std::panicking::rust_panic_with_hook () at src/libstd/panicking.rs:498
#3 0x000055555575b6e8 in std::panicking::begin_panic (msg=..., file_line_col=0x555555862a10)
at /rustc/b2b7a063af39455d7362524da3123c34c3f4842e/src/libstd/panicking.rs:412
#4 0x00005555555a246d in demo::bar () at unwind/examples/demo.rs:13
#5 0x00005555555a24b3 in demo::foo () at unwind/examples/demo.rs:18
#6 0x00005555555a24ef in demo::main () at unwind/examples/demo.rs:24
And the asm:
0x55555575be10 <_Unwind_RaiseException> sub rsp,0x48
0x55555575be14 <_Unwind_RaiseException+4> mov QWORD PTR [rsp+0x8],rdi
0x55555575be19 <_Unwind_RaiseException+9> mov rdi,QWORD PTR [rsp+0x8]
0x55555575be1e <_Unwind_RaiseException+14> mov QWORD PTR [rdi+0x10],0x0
0x55555575be26 <_Unwind_RaiseException+22> lea rdi,[rip+0x1c0a3] # 0x555555777ed0 <<unwind::DwarfUnwinder as core::default::Default>::default>
0x55555575be2d <_Unwind_RaiseException+29> lea rax,[rsp+0x10]
0x55555575be32 <_Unwind_RaiseException+34> mov QWORD PTR [rsp],rdi
0x55555575be36 <_Unwind_RaiseException+38> mov rdi,rax
0x55555575be39 <_Unwind_RaiseException+41> mov rax,QWORD PTR [rsp]
0x55555575be3d <_Unwind_RaiseException+45> call rax
0x55555575be3f <_Unwind_RaiseException+47> jmp 0x55555575be45 <_Unwind_RaiseException+53>
0x55555575be41 <_Unwind_RaiseException+49> ud2
0x55555575be43 <_Unwind_RaiseException+51> ud2
0x55555575be45 <_Unwind_RaiseException+53> lea rax,[rsp+0x8]
0x55555575be4a <_Unwind_RaiseException+58> mov QWORD PTR [rsp+0x30],rax
The binary is at https://dl.roblab.la/demo , and verbose logs in the details below. I'm going to sleep on this for now, hoping for a stroke of genius.
looks like the line it's dying on is https://github.com/gimli-rs/unwind-rs/blob/master/unwind/src/libunwind_shim.rs#L148 if that helps at all.
Alright so after tracing execution many times in GDB, it seems like a problem un _Unwind_Resume. The unwind_lander somehow ends up restoring the stack right there: https://github.com/gimli-rs/unwind-rs/blob/master/unwind/src/libunwind_shim.rs#L114.
Which obviously causes all sorts of issues. It's supposed to be skipping over this stack, so I guess we're doing something wrong in the code that's skipping over the frames in the resume.
FWIW I bisected this using rustup:
Good: 1.32.0-nightly (4a45578bc 2018-12-07) Bad: 1.32.0-nightly (f4a421ee3 2018-12-13)
Sadly, there are no nightly builds from Dec 08 to Dec 12
I had a very quick skim over the 131 commits between those two but nothing obvious stuck out.
The unwinding all looks fine to me, but, from looking at the MIR/LLVM-IR, rust is generating an abort for the unwind cleanup in _Unwind_RaiseException
. Adding #[unwind(allowed)]
to _Unwind_RaiseException
gets it working. I don't understand this enough yet to know if that's correct, and I haven't found out which rust commit caused this
I don't understand how cleanup of DwarfUnwinder
is meant to work. It seems to me that we leak a DwarfUnwinder
every time _Unwind_Resume
is called, because we skip past the stack frame containing it while looking for private_contptr
. Of course, we can't clean it up as part of the unwinding, because the unwinding needs it. Maybe we shouldn't be allocating DwarfUnwinder
on the stack?
This is related to _Unwind_RaiseException
in that _Unwind_RaiseException
doesn't manage to achieve any unwinding... it walks through the frames until it finds itself, discovers it needs to cleanup DwarfUnwinder
, stops unwinding and does so, then calls _Unwind_Resume
.
(This is based on my memory from a couple of hours ago, I haven't double checked the details.)
Oh of course! This is https://github.com/rust-lang/rust/pull/55982 ! Adding the allow-unwind attribute to all those extern functions is certainly the correct fix.
From what I remember the current implementation just hopes that the unwinding code never reaches high enough on the stack to actually touch the unwinder object. Of course this is bad and should be replaced with a proper solution eventually.
We only need to add the allow-unwind attribute to _Unwind_RaiseException
, since it is the only one that we need to allow unwinding for (and we'll need to be careful we support this correctly, currently we don't). None of the rest should ever need unwinding. If they do, it's a bug and the correct behaviour is to abort.
Another option would be to get the context immediately in _Unwind_RaiseException
before calling anything else, and skip over everything up to and including it. This would make its operation more like _Unwind_Resume
.
None of the rest should ever need unwinding. If they do, it's a bug and the correct behaviour is to abort.
Agreed.
and skip over everything up to and including it
The problem I see with skipping frames is inlining. I would much rather have a single clean cut as we do right now, where everything below that is getting unwound and everything above is not.
I would much rather have a single clean cut as we do right now, where everything below that is getting unwound and everything above is not.
Right. I was misunderstanding things a bit. But still, I think we could change where that single clean cut is, so that it is below the cleanup that aborts. And we could do that by getting the context immediately in _Unwind_RaiseException
(which would now need to be an asm function).
Probably should leave this open, because it's going to break on stable rust soon.
I might be missing something here, but why not just modify unwind_tracer
to return an option of registers to jump to? Because I just remembered that the reason DwarfUnwinder::trace
takes a closure is because everything outside of the closure is affected by unwinding while everything inside is not. In fact, the stack problem I described earlier should not even exist, I think I just forgot that I fixed this already. The only problematic part I see right now is that stack objects of unwind_tracer
would be leaked.
why not just modify
unwind_tracer
to return an option of registers to jump to?
Yep I've been looking into doing that.
everything outside of the closure is affected by unwinding while everything inside is not
This is true for _Unwind_RaiseException
but not for _Unwind_Resume
, since resume skips some outside the closure.
In fact, the stack problem I described earlier should not even exist, I think I just forgot that I fixed this already.
Not sure I follow. The DwarfUnwinder
is outside the closure, so it is affected by unwinding. So what I want to do is move DwarfUnwinder
inside the closure.
Not sure I follow. The
DwarfUnwinder
is outside the closure, so it is affected by unwinding. So what I want to do is moveDwarfUnwinder
inside the closure.
I don't see how this is possible, considering that the closure is passed to a function on the unwinder object.
My point is that because we construct an entirely new unwinder in _Unwind_Resume
(as we probably should (?)) it is actually correct to destroy the unwinder object once we have identified a landing pad to jump to. Although - now that I think about it - shouldn't the unwinder's drop function in this case be called by an actual landing pad? Which would then loop infinitely (assuming we stop leaking the unwinder in _Unwind_Resume
)? My head hurts.
Basically I'm unsure if the way I implemented _Unwind_Resume
is entirely correct. My assumptions right now are that for any destructors to run, the personality function gives us a landing pad that we jump to, which then invokes _Unwind_Resume
after doing its cleanup. But I assume we are not supposed to unwind the actual landing pad. Which makes _Unwind_Resume
quite unique/awkward in how it operates - basically we have to assume that the stack below _Unwind_Resume
is garbage and continue at a known-good point - so exactly what we are doing right now.
But if any of these assumptions are incorrect perhaps we should go for a totally different solution. If possible, I would like to keep _Unwind_RaiseException
and _Unwind_Resume
as similar as possible, it would be great if we could just get rid of the skip-hack for _Unwind_Resume
.
At this point I feel like at least for Rust we can just Box::leak
the DwarfUnwinder
to avoid dealing with destructors in this code entirely. Last time I checked Rust had a straight abort for OOM and we also get the minor benefit of reusing the unwinder in _Unwind_Resume
instead of creating a new one.
Although - now that I think about it - shouldn't the unwinder's drop function in this case be called by an actual landing pad? Which would then loop infinitely (assuming we stop leaking the unwinder in
_Unwind_Resume
)?
Yes it is, and this is what causes the problem in this issue. Without #[unwind(allowed)]
, that landing pad will abort. And with #[unwind(allowed)]
, this means _Unwind_RaiseException
accomplishes nothing other than setting private_contptr
and calling _Unwind_Resume
via the landing pad. If _Unwind_Resume
didn't skip frames, then yes this would loop infinitely.
When running the demo example on my machine (64-bit linux) using a new-ish nightly (rustc 1.33.0-nightly (b2b7a063a 2019-01-01)), unwind-rs crashes.
Did something break?