crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.47k stars 1.62k forks source link

Bug: Invalid memory access on LLVMContextDispose #4973

Closed asterite closed 7 years ago

asterite commented 7 years ago

I was able to "reduce" the random crash that sometimes happens on travis and CI.

To reproduce, compile this file:

require "compiler/crystal/**"

def compile
  compiler = Crystal::Compiler.new
  compiler.debug = Crystal::Debug::All
  compiler.compile([
    Crystal::Compiler::Source.new("lala.cr", ""),
  ], "lala")
end

1000.times do |i|
  puts "Start: #{i}"

  compile

  puts "Done compiling: #{i}"

  10.times { GC.collect }

  puts "End: #{i}"
end

The run it, making sure to set the CRYSTAL_PATH env var to the "src" folder of the compiler, otherwise the prelude won't be found.

The crash happens randomly, so after it prints "Start: 20" you can ctrl+c and then run it again. This is the crash I sometimes get:

Invalid memory access (signal 11) at address 0xe0
[0x1062fd23b] *CallStack::print_backtrace:Int32 +107
[0x1062d73ac] __crystal_sigfault_handler +60
[0x107d37713] sigfault_handler +35
[0x7fff91199b3a] _sigtramp +26
[0x107c7e860] _ZNK4llvm12DenseMapBaseINS_13SmallDenseMapIPvNSt3__14pairINS_12PointerUnionIPNS_15MetadataAsValueEPNS_8MetadataEEEyEELj4ENS_12DenseMapInfoIS2_EENS_6detail12DenseMapPairIS2_SB_EEEES2_SB_SD_SG_E15LookupBucketForIS2_EEbRKT_RPKSG_ +6
[0x107c770fc] _ZN4llvm12DenseMapBaseINS_13SmallDenseMapIPvNSt3__14pairINS_12PointerUnionIPNS_15MetadataAsValueEPNS_8MetadataEEEyEELj4ENS_12DenseMapInfoIS2_EENS_6detail12DenseMapPairIS2_SB_EEEES2_SB_SD_SG_E5eraseERKS2_ +18
[0x107c76f49] _ZN4llvm16MetadataTracking7untrackEPvRNS_8MetadataE +49
[0x107c7977d] _ZN4llvm9MDOperand5resetEPNS_8MetadataES2_ +35
[0x107c7ac98] _ZN4llvm6MDNode17dropAllReferencesEv +102
[0x107c673f7] _ZN4llvm15LLVMContextImplD2Ev +2715
[0x107c655a8] _ZN4llvm11LLVMContextD2Ev +22
[0x107c2b996] LLVMContextDispose +22
[0x106fb986c] *LLVM::Context#finalize:Nil +140
[0x1062e0fe1] ~proc6Proc(Pointer(Void), Pointer(Void), Nil)@src/gc/boehm.cr:142 +17
[0x1095938e0] GC_invoke_finalizers +172
[0x109593a2f] GC_notify_or_invoke_finalizers +174
[0x10958f943] GC_try_to_collect_general +203
[0x10958f973] GC_gcollect +11
[0x1063331d9] *GC::collect:Nil +9
[0x1062bed38] __crystal_main +2936
[0x1062d7218] main +40

Sometimes the trace is a bit different.

This happens when passing Crystal::Debug::All for the debug info. It doesn't happen with Crystal::Debug::None. It seems to be related to disposing an llvm MDNode twice, or something like that.

I have no idea why it happens. Our code makes sure that an LLVM::Context is disposed only once. Maybe our custom bindings to the LLVM C++ debug API has bugs, I don't know. I also tried to trace this in LLVM's source code but it's huge and it's written in C++.

The "good" thing is that this is very unlikely to happen when compiling one file. But there's a chance it happens in tests, and then we have to restart CI from time to time and hope it passes.

Any help finding the cause of this would be greatly appreciated! :-)

ysbaddaden commented 7 years ago

I can't reproduce, so I can't tell for sure, but:

The DIBuilder is attached to a Module or Context, doesn't it? Would it be possible that we dispose of the module/context before we dispose of the DIBuilder but the DIBuilder dispose method needs to access it?

asterite commented 7 years ago

I thought so, yes, but we never dispose DIBuilders ¯_(ツ)_/¯