Codegen error on win32 - Githubissues

This error first appeared in the branch for adding socket support to win32: https://github.com/crystal-lang/crystal/pull/10784#issuecomment-858541057

PS D:\crystal\crystal> .\bin\crystal.exe spec .\spec\std\socket\tcp_socket_spec.cr -v -e "raises when connection is refused"
TCPSocket
  #connect
    using IPv4
      raises when connection is refused

The process simply exits during execution of the raises when connection is refused example.

This error only appears on native win32 builds and not when cross-compiling. You need a crystal.exe to reproduce the error. It also does not reproduce when --emit llvm-ir is passed to the compiler. So we can't simply compare the LLVM IR with and without the workaround.

The reproducing source code is available here: https://github.com/straight-shoota/crystal/tree/error/win32-socket and a simple call to TCPSocket.new("127.0.0.1", 1234567) (address and port don't matter, it should just be unused).

A workaround to make the code execute correctly is adding an empty static array to the stack before raising in Addrinfo.resolve:

--- c/src/socket/addrinfo.cr
+++ w/src/socket/addrinfo.cr
@@ -66,6 +66,8 @@ class Socket
               if value.is_a?(Socket::ConnectError)
                 raise Socket::ConnectError.from_os_error("Error connecting to '#{domain}:#{service}'", value.os_error)
               else
+                StaticArray(UInt8, 0).new(0_u8)
+
                 raise value
               end
             end

I have not yet been able to significantly minify and isolate a reproduction. It appears however that the constructor for TCPSocket is a contributing factor:

https://github.com/crystal-lang/crystal/blob/55ebfbfe75e93ffdc9373acf649a473b00b24526/src/socket/tcp_socket.cr#L27-L35

Rewriting that initializer into a class method gets rid of the error. So it might be related to multiple initialization on the same instance (through super calls), but that's just a hunch.

  def self.new2(host, port, dns_timeout = nil, connect_timeout = nil)
    Addrinfo.tcp(host, port, timeout: dns_timeout) do |addrinfo|
      socket = new(addrinfo.family, addrinfo.type, addrinfo.protocol)
      socket.connect(addrinfo, timeout: connect_timeout) do |error|
        socket.close
        error
      end
      socket
    end
  end

I've been able to boil the error down to the following reduced code (with --prelude=empty):

# mini prelude
lib LibC
  fun _CxxThrowException(ex : Void*, throw_info : Void*) : NoReturn
end

@[Primitive(:throw_info)]
def throw_info : Void*
end

def raise(exception : Exception) : NoReturn
  LibC._CxxThrowException(pointerof(exception).as(Void*), throw_info)
end

# error reproduction
def getaddrinfo
  yield
ensure
  nil
end

class XTCPSocket
  def initialize
    getaddrinfo do
      raise Exception.new
    end
  end
end

begin
  XTCPSocket.new
rescue
end

The begin ... rescue is not strictly necessary for reproduction but it makes sure the exit status is good when the bug is not hit.

So the cause seems to be a combination of calling a method with yield and ensure from a constructor.

As noted before, it only reproduces on native builds, not when cross compiling. This is the diff between the LLVM IR for native and cross build:

diff --git 1/test.empty-native.ll 2/test.empty-cross.ll
index 7fc3b1734..c4e7e83c4 100755
--- 1/test.empty-native.ll
+++ 2/test.empty-cross.ll
@@ -35,6 +35,9 @@ target triple = "x86_64-pc-windows-msvc"
 define %XTCPSocket* @__crystal_main(i32 %argc, i8** %argv) personality i32 (...)* @__CxxFrameHandler3 !dbg !4 {
 alloca:
   %0 = alloca %Exception*, !dbg !9
+  br label %entry
+
+entry:                                            ; preds = %alloca
   store i32 %argc, i32* @ARGC_UNSAFE
   store i8** %argv, i8*** @ARGV_UNSAFE
   store %Nil zeroinitializer, %Nil* @"Crystal::BUILD_COMMIT"
@@ -49,10 +52,10 @@ alloca:
   %1 = invoke %XTCPSocket* @.2A.XTCPSocket.3A..3A.new.3A.XTCPSocket()
           to label %invoke_out unwind label %rescue, !dbg !9

-rescue:                                           ; preds = %alloca
+rescue:                                           ; preds = %entry
   %2 = catchswitch within none [label %catch_body] unwind to caller, !dbg !9

-invoke_out:                                       ; preds = %alloca
+invoke_out:                                       ; preds = %entry
   br label %exit, !dbg !9

 exit:                                             ; preds = %this_rescue_target, %invoke_out
@@ -61,9 +64,17 @@ exit:                                             ; preds = %this_rescue_target,

 catch_body:                                       ; preds = %rescue
   %4 = catchpad within %2 [{ i8**, i8*, [6 x i8] }* @"\01??_R0PEAX@8", i32 0, %Exception** %0], !dbg !9
+  %5 = load %Exception*, %Exception** %0, !dbg !9
+  br label %this_rescue, !dbg !9
+
+this_rescue:                                      ; preds = %catch_body
   catchret from %4 to label %this_rescue_target, !dbg !9

-this_rescue_target:                               ; preds = %catch_body
+next_rescue:                                      ; No predecessors!
+  call void @_CxxThrowException(i8* null, i8* null) [ "funclet"(token %4) ], !dbg !9
+  unreachable, !dbg !9
+
+this_rescue_target:                               ; preds = %this_rescue
   br label %exit, !dbg !9
 }

@@ -106,18 +117,22 @@ declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)
 define internal void @.2A.XTCPSocket.23.initialize.3A.NoReturn(%XTCPSocket* %self) #2 personality i32 (...)* @__CxxFrameHandler3 !dbg !17 {
 alloca:
   %0 = alloca %Exception*, !dbg !18
+  br label %entry
+
+entry:                                            ; preds = %alloca
   %1 = call %Exception* @.2A.Exception.40.Reference.3A..3A.new.3A.Exception(), !dbg !19
   invoke void @.2A.raise.3C.Exception.3E..3A.NoReturn(%Exception* %1)
           to label %invoke_out unwind label %rescue, !dbg !18

-rescue:                                           ; preds = %alloca
+rescue:                                           ; preds = %entry
   %2 = catchswitch within none [label %catch_body] unwind to caller, !dbg !18

-invoke_out:                                       ; preds = %alloca
+invoke_out:                                       ; preds = %entry
   unreachable, !dbg !18

 catch_body:                                       ; preds = %rescue
   %3 = catchpad within %2 [{ i8**, i8*, [6 x i8] }* @"\01??_R0PEAX@8", i32 0, %Exception** %0], !dbg !18
+  %4 = load %Exception*, %Exception** %0, !dbg !18
   call void @_CxxThrowException(i8* null, i8* null) [ "funclet"(token %3) ], !dbg !18
   unreachable, !dbg !18
 }
@@ -156,14 +171,10 @@ entry:                                            ; preds = %alloca
 ; Function Attrs: noreturn
 declare void @_CxxThrowException(i8*, i8*) #3

-; Function Attrs: nounwind
-declare void @llvm.stackprotector(i8*, i8**) #4
-
 attributes #0 = { uwtable }
 attributes #1 = { argmemonly nounwind willreturn }
 attributes #2 = { noreturn uwtable }
 attributes #3 = { noreturn }
-attributes #4 = { nounwind }

 !llvm.dbg.cu = !{!0}
 !llvm.module.flags = !{!3}

I'm still investigating this issue, but so far I've found that:

When the bug happens the program crashes with an unhandled exception. If you have a Just-In-Time debugger installed (eg. Visual Studio) and configured, Windows will prompt to start a debugging session. Otherwise I've seen the crash in the System Event Viewer, but I think that can be disabled.
So far I've not been able to reproduce the problem without involving a class constructor, but I'm not sure why yet.
I reduced the code a tiny bit further from the last comment from @straight-shoota. Raising with an ensure block in the constructor reproduces the bug.
The bug disappears when compiling in a single module! Either passing a --single-module or any of the --emit options will do that. This also applies to the spec in the OP (after removing the workaround of course).

All of the above is using the native compiler.

I've tried several variations, but this is the minimal code I'm currently working with. I've extended the mini prelude to be able to use puts for printing debugging messages.

# mini prelude
lib LibC
  fun _CxxThrowException(ex : Void*, throw_info : Void*) : NoReturn
  fun puts(s: UInt8*)
end

@[Primitive(:throw_info)]
def throw_info : Void*
end

def raise(exception : Exception) : NoReturn
  LibC._CxxThrowException(pointerof(exception).as(Void*), throw_info)
end

class String
  def to_unsafe : UInt8*
    pointerof(@c)
  end
end

# error reproduction
class One
  def initialize
    raise Exception.new
  ensure
    LibC.puts("ensured")
  end
end

begin
  One.new
rescue
  LibC.puts("caught!")
end

The puts can be commented out and replaced with a nil expression, it doesn't make a difference.

The LLVM IR outputs from the single module and the multi module compilations looks almost equal. I couldn't find any important difference. In fact, I've compiled the .exes for both, and they have the exact same size.

I've just started using a disassembler (IDA Free version) to inspect the binaries. So far, the only differences seem to be the layout of the assembler code in memory. I've also seen in some cases (while trying variations of code) the unhandled C++ exception error reported by Visual Studio changes to invalid or misaligned exception frame (don't remember the exact error now).

Quick update: not much progress unfortunately, but running both versions in the debugger, buggy multi-module and working single-module, shows that in the bugged version the catch/ensure block section of code in One::initialize is not called when handling the exception when it first occurs. I don't know why yet, but in the working binary, the exception is handled by the ensure block, then rethrown, then rethrown again (not sure from where) and finally handled in the rescue block in __crystal_main. I'm leaning towards some bug compiling/linking the exception information structures, but I don't fully understand how exception handling works on Windows yet.

Ok, this seems to be an LLVM bug. The good news is that it's fixed in version 11. I don't quite understand what exactly is causing the behavior, but the problem is that the MS C++ runtime fails to unwind the frame state and reports:

Unhandled exception at 0x00007FFC08092346 (ntdll.dll) in quux.exe: 0xC0000028: An invalid or unaligned stack was encountered during an unwind operation.

Using the previous example, when the program is compiled in multiple units, the One class

class One
  def initialize
    raise Exception.new
  ensure
    LibC.puts("ensured")
  end
end

is compiled into the following LLVM IR code, sans type definitions and declarations for clarity:

; Function Attrs: uwtable
define %One* @.2A.One.3A..3A.new.3A.One() #0 {
alloca:
  %_ = alloca %One*
  br label %entry

entry:                                            ; preds = %alloca
  %0 = call i8* @malloc(i64 ptrtoint (i32* getelementptr (i32, i32* null, i32 1) to i64))
  %1 = bitcast i8* %0 to %One*
  %2 = bitcast %One* %1 to i8*
  call void @llvm.memset.p0i8.i64(i8* align 4 %2, i8 0, i64 ptrtoint (i32* getelementptr (i32, i32* null, i32 1) to i64), i1 false)
  %3 = getelementptr inbounds %One, %One* %1, i32 0, i32 0
  store i32 6, i32* %3
  store %One* %1, %One** %_
  %4 = load %One*, %One** %_
  call void @.2A.One.23.initialize.3A.Nil(%One* %4)
  %5 = load %One*, %One** %_
  ret %One* %5
}

; Function Attrs: uwtable
define void @.2A.One.23.initialize.3A.Nil(%One* %self) #0 personality i32 (...)* @__CxxFrameHandler3 {
alloca:
  %0 = alloca %Exception*
  br label %entry

entry:                                            ; preds = %alloca
  %1 = call %Exception* @.2A.Exception.40.Reference.3A..3A.new.3A.Exception()
  invoke void @.2A.raise.3C.Exception.3E..3A.NoReturn(%Exception* %1)
          to label %invoke_out unwind label %rescue

rescue:                                           ; preds = %entry
  %2 = catchswitch within none [label %catch_body] unwind to caller

invoke_out:                                       ; preds = %entry
  unreachable

catch_body:                                       ; preds = %rescue
  %3 = catchpad within %2 [{ i8**, i8*, [6 x i8] }* @"\01??_R0PEAX@8", i32 0, %Exception** %0]
  %4 = load %Exception*, %Exception** %0
  %5 = call i8* @.2A.String.23.to_unsafe.3A.Pointer.28.UInt8.29.(%String* bitcast ({ i32, i32, i32, [8 x i8] }* @"'ensured'" to %String*)) [ "funclet"(token %3) ]
  call void @puts(i8* %5) [ "funclet"(token %3) ]
  call void @_CxxThrowException(i8* null, i8* null) [ "funclet"(token %3) ]
  unreachable
}

The problem is in the initialize code, around the invoke:

entry:                                            ; preds = %alloca
  %1 = call %Exception* @.2A.Exception.40.Reference.3A..3A.new.3A.Exception()
  invoke void @.2A.raise.3C.Exception.3E..3A.NoReturn(%Exception* %1)
          to label %invoke_out unwind label %rescue

rescue:                                           ; preds = %entry
  %2 = catchswitch within none [label %catch_body] unwind to caller

invoke_out:                                       ; preds = %entry
  unreachable

The difference between LLVM 10 and LLVM 11 in the assembler generated is:

--- o-ne.s.10   2021-10-04 20:07:07.060845300 -0300                                                                                                 
+++ o-ne.s.11   2021-10-04 20:07:21.088512100 -0300                                                                                                 
@@ -72,10 +72,11 @@                                                                                                                                 
        movq    %rax, %rcx                                                                                                                          
        callq   .2A.raise.3C.Exception.3E..3A.NoReturn                                                                                              
 .Ltmp1:                                                                                                                                            
        jmp     .LBB1_1                                                                                                                             
 .LBB1_1:                                # %invoke_out                                                                                              
+       int3                                                                                                                                        
        .seh_handlerdata                                                                                                                            
        .long   ($cppxdata$.2A.One.23.initialize.3A.Nil)@IMGREL                                                                                     
        .text                                                                                                                                       
        .seh_endproc                                                                                                                                
        .def     "?catch$2@?0?.2A.One.23.initialize.3A.Nil@4HA";

ie. one extra int3. In fact, when debugging the LLVM 10 version which normally crashes, placing a breakpoint (which AFAIK introduces an int3 at the location) on the jmp instruction makes the code work!

I am completely lost at why the extra opcode/byte fixes the issue. A binary diff between the .obj files confirms that's the only difference. Probably relevant is the fact that the associated unwind info in the .pdata section also extends by one byte in the LLVM11 version. It's also a mystery why this works when compiling as a single module. In LLVM 10 the assembler code in single module does include the extra int3.

I'm running out of ideas, so I'm putting this on the back-burner for a while. In the meantime, maybe we can upgrade to LLVM 11 or 12 with the patches for #10359 applied?

Well, it seems I just needed to dump my brain in the previous comment to get to the bottom of this. 😃 I found a couple of LLVM commits involved, but the end of the thread seems to be https://github.com/llvm/llvm-project/commit/597718aae017a870e99cdb37b3bc10d8dfa58a25 . From the commits referenced there, https://github.com/llvm/llvm-project/commit/5ff5ddd0adc89f8827b345577bbb3e7eb74fc644 provides the most information, pointing to the X86AvoidTrailingCall.cpp file which reads:

// The Windows x64 unwinder decodes the instruction stream during unwinding.
// The unwinder decodes forward from the current PC to detect epilogue code
// patterns.
//
// First, this means that there must be an instruction after every
// call instruction for the unwinder to decode. LLVM must maintain the invariant
// that the last instruction of a function or funclet is not a call, or the
// unwinder may decode into the next function. Similarly, a call may not
// immediately precede an epilogue code pattern. As of this writing, the
// SEH_Epilogue pseudo instruction takes care of that.
//
// Second, all non-tail call jump targets must be within the *half-open*
// interval of the bounds of the function. The unwinder distinguishes between
// internal jump instructions and tail calls in an epilogue sequence by checking
// the jump target against the function bounds from the .pdata section. This
// means that the last regular MBB of an LLVM function must not be empty if
// there are regular jumps targeting it.
//
// This pass upholds these invariants by ensuring that blocks at the end of a
// function or funclet are a) not empty and b) do not end in a CALL instruction.

This means that the last regular MBB of an LLVM function must not be empty if there are regular jumps targeting it.

So, yeah. LLVM produces an empty basic block for the "normal" exit of the invoke because it is unreachable, but that confuses the unwinder 🤷 I have no idea why this doesn't happen when compiling a single module.

A workaround may be possible, but to be honest it looks like upgrading to a newer LLVM would be best.

Thank you very much @ggiraldez for digging into this.

Knowing that it's an LLVM bug and fixed in recent releases is really helpful. That redeems us from having to introduce a workaround.

As a somewhat surprising coincidence, LLVM 13 has just been released yesterday. 🎉 That release should include all the patches we need for #10359 and this issue. We should be able to update LLVM for the binary distribution and already link the upcoming Crystal 1.2.0 with LLVM 13.

We can probably start using LLVM 13 in win32 CI right now.

crystal-lang / crystal

Codegen error on win32 #11047