Closed straight-shoota closed 2 years ago
I've been able to boil the error down to the following reduced code (with --prelude=empty
):
# mini prelude
lib LibC
fun _CxxThrowException(ex : Void*, throw_info : Void*) : NoReturn
end
@[Primitive(:throw_info)]
def throw_info : Void*
end
def raise(exception : Exception) : NoReturn
LibC._CxxThrowException(pointerof(exception).as(Void*), throw_info)
end
# error reproduction
def getaddrinfo
yield
ensure
nil
end
class XTCPSocket
def initialize
getaddrinfo do
raise Exception.new
end
end
end
begin
XTCPSocket.new
rescue
end
The begin ... rescue
is not strictly necessary for reproduction but it makes sure the exit status is good when the bug is not hit.
So the cause seems to be a combination of calling a method with yield
and ensure
from a constructor.
As noted before, it only reproduces on native builds, not when cross compiling. This is the diff between the LLVM IR for native and cross build:
diff --git 1/test.empty-native.ll 2/test.empty-cross.ll
index 7fc3b1734..c4e7e83c4 100755
--- 1/test.empty-native.ll
+++ 2/test.empty-cross.ll
@@ -35,6 +35,9 @@ target triple = "x86_64-pc-windows-msvc"
define %XTCPSocket* @__crystal_main(i32 %argc, i8** %argv) personality i32 (...)* @__CxxFrameHandler3 !dbg !4 {
alloca:
%0 = alloca %Exception*, !dbg !9
+ br label %entry
+
+entry: ; preds = %alloca
store i32 %argc, i32* @ARGC_UNSAFE
store i8** %argv, i8*** @ARGV_UNSAFE
store %Nil zeroinitializer, %Nil* @"Crystal::BUILD_COMMIT"
@@ -49,10 +52,10 @@ alloca:
%1 = invoke %XTCPSocket* @.2A.XTCPSocket.3A..3A.new.3A.XTCPSocket()
to label %invoke_out unwind label %rescue, !dbg !9
-rescue: ; preds = %alloca
+rescue: ; preds = %entry
%2 = catchswitch within none [label %catch_body] unwind to caller, !dbg !9
-invoke_out: ; preds = %alloca
+invoke_out: ; preds = %entry
br label %exit, !dbg !9
exit: ; preds = %this_rescue_target, %invoke_out
@@ -61,9 +64,17 @@ exit: ; preds = %this_rescue_target,
catch_body: ; preds = %rescue
%4 = catchpad within %2 [{ i8**, i8*, [6 x i8] }* @"\01??_R0PEAX@8", i32 0, %Exception** %0], !dbg !9
+ %5 = load %Exception*, %Exception** %0, !dbg !9
+ br label %this_rescue, !dbg !9
+
+this_rescue: ; preds = %catch_body
catchret from %4 to label %this_rescue_target, !dbg !9
-this_rescue_target: ; preds = %catch_body
+next_rescue: ; No predecessors!
+ call void @_CxxThrowException(i8* null, i8* null) [ "funclet"(token %4) ], !dbg !9
+ unreachable, !dbg !9
+
+this_rescue_target: ; preds = %this_rescue
br label %exit, !dbg !9
}
@@ -106,18 +117,22 @@ declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)
define internal void @.2A.XTCPSocket.23.initialize.3A.NoReturn(%XTCPSocket* %self) #2 personality i32 (...)* @__CxxFrameHandler3 !dbg !17 {
alloca:
%0 = alloca %Exception*, !dbg !18
+ br label %entry
+
+entry: ; preds = %alloca
%1 = call %Exception* @.2A.Exception.40.Reference.3A..3A.new.3A.Exception(), !dbg !19
invoke void @.2A.raise.3C.Exception.3E..3A.NoReturn(%Exception* %1)
to label %invoke_out unwind label %rescue, !dbg !18
-rescue: ; preds = %alloca
+rescue: ; preds = %entry
%2 = catchswitch within none [label %catch_body] unwind to caller, !dbg !18
-invoke_out: ; preds = %alloca
+invoke_out: ; preds = %entry
unreachable, !dbg !18
catch_body: ; preds = %rescue
%3 = catchpad within %2 [{ i8**, i8*, [6 x i8] }* @"\01??_R0PEAX@8", i32 0, %Exception** %0], !dbg !18
+ %4 = load %Exception*, %Exception** %0, !dbg !18
call void @_CxxThrowException(i8* null, i8* null) [ "funclet"(token %3) ], !dbg !18
unreachable, !dbg !18
}
@@ -156,14 +171,10 @@ entry: ; preds = %alloca
; Function Attrs: noreturn
declare void @_CxxThrowException(i8*, i8*) #3
-; Function Attrs: nounwind
-declare void @llvm.stackprotector(i8*, i8**) #4
-
attributes #0 = { uwtable }
attributes #1 = { argmemonly nounwind willreturn }
attributes #2 = { noreturn uwtable }
attributes #3 = { noreturn }
-attributes #4 = { nounwind }
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3}
I'm still investigating this issue, but so far I've found that:
--single-module
or any of the --emit
options will do that. This also applies to the spec in the OP (after removing the workaround of course).All of the above is using the native compiler.
I've tried several variations, but this is the minimal code I'm currently working with. I've extended the mini prelude to be able to use puts
for printing debugging messages.
# mini prelude
lib LibC
fun _CxxThrowException(ex : Void*, throw_info : Void*) : NoReturn
fun puts(s: UInt8*)
end
@[Primitive(:throw_info)]
def throw_info : Void*
end
def raise(exception : Exception) : NoReturn
LibC._CxxThrowException(pointerof(exception).as(Void*), throw_info)
end
class String
def to_unsafe : UInt8*
pointerof(@c)
end
end
# error reproduction
class One
def initialize
raise Exception.new
ensure
LibC.puts("ensured")
end
end
begin
One.new
rescue
LibC.puts("caught!")
end
The puts
can be commented out and replaced with a nil
expression, it doesn't make a difference.
The LLVM IR outputs from the single module and the multi module compilations looks almost equal. I couldn't find any important difference. In fact, I've compiled the .exe
s for both, and they have the exact same size.
I've just started using a disassembler (IDA Free version) to inspect the binaries. So far, the only differences seem to be the layout of the assembler code in memory. I've also seen in some cases (while trying variations of code) the unhandled C++ exception error reported by Visual Studio changes to invalid or misaligned exception frame (don't remember the exact error now).
Quick update: not much progress unfortunately, but running both versions in the debugger, buggy multi-module and working single-module, shows that in the bugged version the catch/ensure block section of code in One::initialize
is not called when handling the exception when it first occurs. I don't know why yet, but in the working binary, the exception is handled by the ensure
block, then rethrown, then rethrown again (not sure from where) and finally handled in the rescue
block in __crystal_main
. I'm leaning towards some bug compiling/linking the exception information structures, but I don't fully understand how exception handling works on Windows yet.
Ok, this seems to be an LLVM bug. The good news is that it's fixed in version 11. I don't quite understand what exactly is causing the behavior, but the problem is that the MS C++ runtime fails to unwind the frame state and reports:
Unhandled exception at 0x00007FFC08092346 (ntdll.dll) in quux.exe: 0xC0000028: An invalid or unaligned stack was encountered during an unwind operation.
Using the previous example, when the program is compiled in multiple units, the One
class
class One
def initialize
raise Exception.new
ensure
LibC.puts("ensured")
end
end
is compiled into the following LLVM IR code, sans type definitions and declarations for clarity:
; Function Attrs: uwtable
define %One* @.2A.One.3A..3A.new.3A.One() #0 {
alloca:
%_ = alloca %One*
br label %entry
entry: ; preds = %alloca
%0 = call i8* @malloc(i64 ptrtoint (i32* getelementptr (i32, i32* null, i32 1) to i64))
%1 = bitcast i8* %0 to %One*
%2 = bitcast %One* %1 to i8*
call void @llvm.memset.p0i8.i64(i8* align 4 %2, i8 0, i64 ptrtoint (i32* getelementptr (i32, i32* null, i32 1) to i64), i1 false)
%3 = getelementptr inbounds %One, %One* %1, i32 0, i32 0
store i32 6, i32* %3
store %One* %1, %One** %_
%4 = load %One*, %One** %_
call void @.2A.One.23.initialize.3A.Nil(%One* %4)
%5 = load %One*, %One** %_
ret %One* %5
}
; Function Attrs: uwtable
define void @.2A.One.23.initialize.3A.Nil(%One* %self) #0 personality i32 (...)* @__CxxFrameHandler3 {
alloca:
%0 = alloca %Exception*
br label %entry
entry: ; preds = %alloca
%1 = call %Exception* @.2A.Exception.40.Reference.3A..3A.new.3A.Exception()
invoke void @.2A.raise.3C.Exception.3E..3A.NoReturn(%Exception* %1)
to label %invoke_out unwind label %rescue
rescue: ; preds = %entry
%2 = catchswitch within none [label %catch_body] unwind to caller
invoke_out: ; preds = %entry
unreachable
catch_body: ; preds = %rescue
%3 = catchpad within %2 [{ i8**, i8*, [6 x i8] }* @"\01??_R0PEAX@8", i32 0, %Exception** %0]
%4 = load %Exception*, %Exception** %0
%5 = call i8* @.2A.String.23.to_unsafe.3A.Pointer.28.UInt8.29.(%String* bitcast ({ i32, i32, i32, [8 x i8] }* @"'ensured'" to %String*)) [ "funclet"(token %3) ]
call void @puts(i8* %5) [ "funclet"(token %3) ]
call void @_CxxThrowException(i8* null, i8* null) [ "funclet"(token %3) ]
unreachable
}
The problem is in the initialize
code, around the invoke
:
entry: ; preds = %alloca
%1 = call %Exception* @.2A.Exception.40.Reference.3A..3A.new.3A.Exception()
invoke void @.2A.raise.3C.Exception.3E..3A.NoReturn(%Exception* %1)
to label %invoke_out unwind label %rescue
rescue: ; preds = %entry
%2 = catchswitch within none [label %catch_body] unwind to caller
invoke_out: ; preds = %entry
unreachable
The difference between LLVM 10 and LLVM 11 in the assembler generated is:
--- o-ne.s.10 2021-10-04 20:07:07.060845300 -0300
+++ o-ne.s.11 2021-10-04 20:07:21.088512100 -0300
@@ -72,10 +72,11 @@
movq %rax, %rcx
callq .2A.raise.3C.Exception.3E..3A.NoReturn
.Ltmp1:
jmp .LBB1_1
.LBB1_1: # %invoke_out
+ int3
.seh_handlerdata
.long ($cppxdata$.2A.One.23.initialize.3A.Nil)@IMGREL
.text
.seh_endproc
.def "?catch$2@?0?.2A.One.23.initialize.3A.Nil@4HA";
ie. one extra int3
. In fact, when debugging the LLVM 10 version which normally crashes, placing a breakpoint (which AFAIK introduces an int3
at the location) on the jmp
instruction makes the code work!
I am completely lost at why the extra opcode/byte fixes the issue. A binary diff between the .obj
files confirms that's the only difference. Probably relevant is the fact that the associated unwind info in the .pdata
section also extends by one byte in the LLVM11 version. It's also a mystery why this works when compiling as a single module. In LLVM 10 the assembler code in single module does include the extra int3
.
I'm running out of ideas, so I'm putting this on the back-burner for a while. In the meantime, maybe we can upgrade to LLVM 11 or 12 with the patches for #10359 applied?
Well, it seems I just needed to dump my brain in the previous comment to get to the bottom of this. 😃 I found a couple of LLVM commits involved, but the end of the thread seems to be https://github.com/llvm/llvm-project/commit/597718aae017a870e99cdb37b3bc10d8dfa58a25 . From the commits referenced there, https://github.com/llvm/llvm-project/commit/5ff5ddd0adc89f8827b345577bbb3e7eb74fc644 provides the most information, pointing to the X86AvoidTrailingCall.cpp file which reads:
// The Windows x64 unwinder decodes the instruction stream during unwinding.
// The unwinder decodes forward from the current PC to detect epilogue code
// patterns.
//
// First, this means that there must be an instruction after every
// call instruction for the unwinder to decode. LLVM must maintain the invariant
// that the last instruction of a function or funclet is not a call, or the
// unwinder may decode into the next function. Similarly, a call may not
// immediately precede an epilogue code pattern. As of this writing, the
// SEH_Epilogue pseudo instruction takes care of that.
//
// Second, all non-tail call jump targets must be within the *half-open*
// interval of the bounds of the function. The unwinder distinguishes between
// internal jump instructions and tail calls in an epilogue sequence by checking
// the jump target against the function bounds from the .pdata section. This
// means that the last regular MBB of an LLVM function must not be empty if
// there are regular jumps targeting it.
//
// This pass upholds these invariants by ensuring that blocks at the end of a
// function or funclet are a) not empty and b) do not end in a CALL instruction.
This means that the last regular MBB of an LLVM function must not be empty if there are regular jumps targeting it.
So, yeah. LLVM produces an empty basic block for the "normal" exit of the invoke
because it is unreachable, but that confuses the unwinder 🤷 I have no idea why this doesn't happen when compiling a single module.
A workaround may be possible, but to be honest it looks like upgrading to a newer LLVM would be best.
Thank you very much @ggiraldez for digging into this.
Knowing that it's an LLVM bug and fixed in recent releases is really helpful. That redeems us from having to introduce a workaround.
As a somewhat surprising coincidence, LLVM 13 has just been released yesterday. 🎉 That release should include all the patches we need for #10359 and this issue. We should be able to update LLVM for the binary distribution and already link the upcoming Crystal 1.2.0 with LLVM 13.
We can probably start using LLVM 13 in win32 CI right now.
This error first appeared in the branch for adding socket support to win32: https://github.com/crystal-lang/crystal/pull/10784#issuecomment-858541057
The process simply exits during execution of the
raises when connection is refused
example.This error only appears on native win32 builds and not when cross-compiling. You need a
crystal.exe
to reproduce the error. It also does not reproduce when--emit llvm-ir
is passed to the compiler. So we can't simply compare the LLVM IR with and without the workaround.The reproducing source code is available here: https://github.com/straight-shoota/crystal/tree/error/win32-socket and a simple call to
TCPSocket.new("127.0.0.1", 1234567)
(address and port don't matter, it should just be unused).A workaround to make the code execute correctly is adding an empty static array to the stack before raising in
Addrinfo.resolve
:I have not yet been able to significantly minify and isolate a reproduction. It appears however that the constructor for
TCPSocket
is a contributing factor:https://github.com/crystal-lang/crystal/blob/55ebfbfe75e93ffdc9373acf649a473b00b24526/src/socket/tcp_socket.cr#L27-L35
Rewriting that initializer into a class method gets rid of the error. So it might be related to multiple initialization on the same instance (through
super
calls), but that's just a hunch.