Open airhorns opened 11 years ago
Thanks for informative description, pastes are ok.
Weird thing is dying of GC itself, i'll try to reproduce this error, probably this has something to do with mongrel and some signal handling etc. As a workaround you can try forking process before dumping, something like
fork { HeapDump.dump; exit }
so that at least original worker is intact even if dump fails.
Yep, that's mongrel messing up the stack somehow, reproduced similar fault with this minimal rack-app:
#!/usr/bin/env ruby
require 'rubygems'
require 'bundler/setup'
require 'rack'
require 'heap_dump'
class App
def call(env)
HeapDump.dump
[200, {"Content-Type" => "text/html"}, ["Some reply"]]
end
end
Rack::Handler::Mongrel.run App.new, :Port => 9292
Gemfile:
source :rubygems
gem 'rack'
gem 'mongrel', "1.2.0.pre2"
gem 'heap_dump'
Real browser is not necessary, curl localhost:9292
is sufficient to trigger a boom :)
Forking helps. Replacing mongrel with thin(probably a better idea) also helps. But i'm looking for exact cause
@Vasfed awesome, switching to WEBrick locally on my hacked up version of heap dump allows the dump to succeed! We still see the original segfault on WEBrick as well as Unicorn however, is that expected?
And by real browser you are right what I was trying to say is that things like integration tests which just use the rack app in ruby land work just fine, which as you saw points a big fat finger at the webserver :) Thanks so much for your help!
@hornairs you're welcome. It's not supposed to segfault, current guess is that this may be something related to threads, as unlike original GC heapdump does not lock everything up during dump I will look into it tomorrow
Hi @Vasfed, thanks so much for making this tool. We're getting major segfaults however. Here are the facts:
Here's the trace for it:
Interestingly we tried just commenting that piece of code:
and still got another segfault in the dump_thread section. It gives the same ruby land trace but GDB breaks on this error:
Finally, if we disable that second piece of code:
we get a heap dump on the first request and a segfault on the second request in the
GC.start
called byheap_dump
:who's GDB stack trace looks like this:
Sorry for all the monstrous pastes. Any ideas? I have a pretty limited knowledge of MRI internals but it would seem like either we have a weird C extension putting bad values on the stack which heap_dump dies on, or heap_dump isn't getting the right values of
stack_start
andstack_end
some how, and in the process messing something up such that the GC dies.Can I give you any more information? Thanks for any pointers you can give us!