Seg fault in stack analysis phase of dump (1.9.3-p0, 1.9.3-p194)

airhorns commented 11 years ago

Hi @Vasfed, thanks so much for making this tool. We're getting major segfaults however. Here are the facts:

the segfault only occurs when we hit our Rails application in development or production mode with a real browser, but not in test mode, even for integration tests
the segfault does not occur in the console just booting our application and dumping
according to my admittedly poor GDB foo the segfault occurs in the machine context dumping phase of the dump, see the second paste below
the segfault happens on ruby versions 1.9.3 with falcon's performance patches (which makes sense), but still happens on v1.9.3-p194 and v1.9.3-p0.

Here's the trace for it:

oration ~/S/shopify  (performance*) ➜  bx rails s
WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8
DEPRECATION WARNING: RAILS_ROOT is deprecated. Please use ::Rails.root.to_s. (called from <top (required)> at /Users/hornairs/Sites/shopify/config/application.rb:14)
=> Booting Mongrel
=> Rails 3.0.20 application starting in development on http://0.0.0.0:3000
=> Call with -d to detach
=> Ctrl-C to shutdown server
Deactivating asset logging. To see asset requests in log, start with LOG_ASSETS=true env variable.
/Users/hornairs/Code/heap_dump/lib/heap_dump.rb:10: [BUG] Bus Error
ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin12.2.0]

-- Control frame information -----------------------------------------------
c:0024 p:---- s:0131 b:0131 l:000130 d:000130 CFUNC  :dump_ext
c:0023 p:0041 s:0127 b:0127 l:000126 d:000126 METHOD /Users/hornairs/Code/heap_dump/lib/heap_dump.rb:10
c:0022 p:0113 s:0122 b:0122 l:000121 d:000121 METHOD /Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:52
c:0021 p:0231 s:0118 b:0118 l:001fc8 d:001fc8 METHOD /Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:26
c:0020 p:0017 s:0109 b:0109 l:002100 d:002100 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/rack.rb:42
c:0019 p:0015 s:0103 b:0103 l:001db8 d:001db8 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/user_informer.rb:1
c:0018 p:0032 s:0094 b:0094 l:001f10 d:001f10 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:168
c:0017 p:0021 s:0090 b:0090 l:0013a8 d:0013a8 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:77
c:0016 p:---- s:0085 b:0085 l:000084 d:000084 FINISH
c:0015 p:0179 s:0083 b:0083 l:0014f8 d:000b80 BLOCK  /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:47
c:0014 p:---- s:0076 b:0076 l:000075 d:000075 FINISH
c:0013 p:---- s:0074 b:0074 l:000073 d:000073 CFUNC  :each
c:0012 p:0058 s:0071 b:0071 l:0014f8 d:0014f8 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:41
c:0011 p:0015 s:0062 b:0062 l:000938 d:000938 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/rack/log_tailer.rb:14
c:0010 p:0015 s:0057 b:0057 l:000a10 d:000a10 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/content_length.rb:13
c:0009 p:0015 s:0049 b:0049 l:000b28 d:000b28 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/chunked.rb:15
c:0008 p:0226 s:0042 b:0042 l:0022b0 d:0022b0 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/handler/mongrel.rb:67
c:0007 p:0017 s:0032 b:0032 l:001808 d:001870 BLOCK  /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:165
c:0006 p:---- s:0029 b:0029 l:000028 d:000028 FINISH
c:0005 p:---- s:0027 b:0027 l:000026 d:000026 CFUNC  :each
c:0004 p:0417 s:0024 b:0024 l:001808 d:001808 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:164
c:0003 p:0012 s:0007 b:0007 l:001418 d:001410 BLOCK  /Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:291
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP   

-- Ruby level backtrace information ----------------------------------------
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:291:in `block (2 levels) in run'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:164:in `process_client'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:164:in `each'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:165:in `block in process_client'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/handler/mongrel.rb:67:in `process'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/chunked.rb:15:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/content_length.rb:13:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/rack/log_tailer.rb:14:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:41:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:41:in `each'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:47:in `block in call'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:77:in `method_missing'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:168:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/user_informer.rb:12:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/rack.rb:42:in `call'
/Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:26:in `call'
/Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:52:in `dump_heap'
/Users/hornairs/Code/heap_dump/lib/heap_dump.rb:10:in `dump'
/Users/hornairs/Code/heap_dump/lib/heap_dump.rb:10:in `dump_ext'

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000101e46000
[Switching to process 54658 thread 0x2a03]
dump_machine_context (ctx=0x101d023d0) at heap_dump.c:1534
1534          VALUE v = *(x++);
(gdb) p &x
Address requested for identifier "x" which is in register $r13

Interestingly we tried just commenting that piece of code:

diff --git a/ext/heap_dump/heap_dump.c b/ext/heap_dump/heap_dump.c
index 81dd872..aa5f425 100644
--- a/ext/heap_dump/heap_dump.c
+++ b/ext/heap_dump/heap_dump.c
@@ -1464,10 +1464,10 @@ static void dump_thread(const rb_thread_t* th, walk_ctx_t *ctx){
       VALUE *stack_start, *stack_end;
       GET_STACK_BOUNDS(stack_start, stack_end, 0);
       // /sizeof(VALUE)?
-      yg_cstring("mach_stack");
-      yg_array();
-      dump_locations(stack_start, (stack_end-stack_start), ctx);
-      yg_array_end();
+      /*yg_cstring("mach_stack");*/
+      /*yg_array();*/
+      /*dump_locations(stack_start, (stack_end-stack_start), ctx);*/
+      /*yg_array_end();*/

       yg_cstring("mach_regs");
       yg_array();

and still got another segfault in the dump_thread section. It gives the same ruby land trace but GDB breaks on this error:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x000000010a094000
[Switching to process 70749 thread 0x2a03]
0x000000010911d2e0 in dump_locations (p=0x109f502e0, n=17591274555040, ctx=0x109f503e0) at ../../../../ext/heap_dump/heap_dump.c:679
679       VALUE v = *x;
(gdb) where
#0  0x000000010911d2e0 in dump_locations (p=0x109f502e0, n=17591274555040, ctx=0x109f503e0) at ../../../../ext/heap_dump/heap_dump.c:679
#1  0x00000001091203b2 in dump_thread (th=0x7fafd9c03f30, ctx=0x109f503e0) at ../../../../ext/heap_dump/heap_dump.c:1469
#2  0x000000010911df98 in dump_data_if_known (obj=140393253877360, ctx=0x109f503e0) at ../../../../ext/heap_dump/heap_dump.c:900
#3  0x000000010911f165 in walk_live_object (obj=140393253877360, ctx=0x109f503e0) at ../../../../ext/heap_dump/heap_dump.c:1155
#4  0x000000010911e1d9 in objspace_walker (vstart=0x7fafda06a608, vend=0x7fafda06e5f0, stride=40, data=0x109f503e0) at ../../../../ext/heap_dump/heap_dump.c:1180
#5  0x00000001085d91ff in objspace_each_objects ()
#6  0x00000001085c776f in rb_ensure ()
#7  0x00000001085dc7cc in rb_objspace_each_objects ()
#8  0x0000000109120a2b in heapdump_dump (filename=0x7fafbb7823e0 "/Users/hornairs/Sites/shopify/heap_dump_20130211184328") at ../../../../ext/heap_dump/heap_dump.c:1673
#9  0x0000000109120af9 in rb_heapdump_dump (self=140393280935600, filename=140392802128800) at ../../../../ext/heap_dump/heap_dump.c:1687
#10 0x0000000108709f17 in vm_call_method ()
#11 0x00000001086f74ae in vm_exec_core ()
#12 0x00000001086fca13 in vm_exec ()
#13 0x00000001086fe4ba in vm_call0 ()
#14 0x00000001086ffe77 in rb_call0 ()
#15 0x00000001087007f4 in vm_method_missing ()
#16 0x000000010870a397 in vm_call_method ()
#17 0x00000001086f74ae in vm_exec_core ()
#18 0x00000001086fca13 in vm_exec ()
#19 0x000000010870c490 in rb_yield ()
#20 0x000000010858e36e in rb_ary_each ()
#21 0x0000000108709f17 in vm_call_method ()
#22 0x00000001086f74ae in vm_exec_core ()
#23 0x00000001086fca13 in vm_exec ()
#24 0x000000010870c490 in rb_yield ()
#25 0x000000010858e36e in rb_ary_each ()
#26 0x0000000108709f17 in vm_call_method ()
#27 0x00000001086f74ae in vm_exec_core ()
#28 0x00000001086fca13 in vm_exec ()
#29 0x00000001086fdd7d in rb_vm_invoke_proc ()
#30 0x00000001087150b9 in thread_start_func_2 ()
#31 0x00000001087151e6 in thread_start_func_1 ()
#32 0x00007fff9353e742 in _pthread_start ()
#33 0x00007fff9352b181 in thread_start ()

Finally, if we disable that second piece of code:

diff --git a/ext/heap_dump/heap_dump.c b/ext/heap_dump/heap_dump.c
index 81dd872..aa5f425 100644
--- a/ext/heap_dump/heap_dump.c
+++ b/ext/heap_dump/heap_dump.c
@@ -1464,10 +1464,10 @@ static void dump_thread(const rb_thread_t* th, walk_ctx_t *ctx){
       VALUE *stack_start, *stack_end;
       GET_STACK_BOUNDS(stack_start, stack_end, 0);
       // /sizeof(VALUE)?
-      yg_cstring("mach_stack");
-      yg_array();
-      dump_locations(stack_start, (stack_end-stack_start), ctx);
-      yg_array_end();
+      /*yg_cstring("mach_stack");*/
+      /*yg_array();*/
+      /*dump_locations(stack_start, (stack_end-stack_start), ctx);*/
+      /*yg_array_end();*/

       yg_cstring("mach_regs");
       yg_array();
@@ -1523,23 +1523,6 @@ static void dump_machine_context(walk_ctx_t *ctx){
       yg_id(v);
   }
   yajl_gen_array_close(ctx->yajl);
-
-  yg_cstring("stack");
-  yajl_gen_array_open(ctx->yajl);
-  //rb_gc_mark_locations(stack_start, stack_end);
-  if(stack_start < stack_end){
-    n = stack_end - stack_start;
-    x = stack_start;
-    while (n--) {
-      VALUE v = *(x++);
-      //FIXME: other objspace (not default one?)
-      if(is_in_heap((void*)v, NULL)) {
-        yg_id(v);
-      }
-    }
-  }
-
-  yajl_gen_array_close(ctx->yajl);
 }

 #ifdef HAVE_RB_CLASS_TBL

we get a heap dump on the first request and a segfault on the second request in the GC.start called by heap_dump:

Dumping heap to /Users/hornairs/Sites/shopify/heap_dump_20130211192503
/Users/hornairs/Code/heap_dump/lib/heap_dump.rb:9: [BUG] Segmentation fault
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12.2.0]

-- Control frame information -----------------------------------------------
c:0024 p:---- s:0130 b:0130 l:000129 d:000129 CFUNC  :start
c:0023 p:0029 s:0127 b:0127 l:000126 d:000126 METHOD /Users/hornairs/Code/heap_dump/lib/heap_dump.rb:9
c:0022 p:0113 s:0122 b:0122 l:000121 d:000121 METHOD /Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:52
c:0021 p:0215 s:0118 b:0118 l:000c18 d:000c18 METHOD /Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:26
c:0020 p:0017 s:0109 b:0109 l:000d60 d:000d60 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/rack.rb:42
c:0019 p:0015 s:0103 b:0103 l:001f88 d:001f88 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/user_informer.rb
c:0018 p:0032 s:0094 b:0094 l:001500 d:001500 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:168
c:0017 p:0021 s:0090 b:0090 l:0018a8 d:0018a8 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:77
c:0016 p:---- s:0085 b:0085 l:000084 d:000084 FINISH
c:0015 p:0179 s:0083 b:0083 l:0022e8 d:000000 BLOCK  /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:47
c:0014 p:---- s:0076 b:0076 l:000075 d:000075 FINISH
c:0013 p:---- s:0074 b:0074 l:000073 d:000073 CFUNC  :each
c:0012 p:0058 s:0071 b:0071 l:0022e8 d:0022e8 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:41
c:0011 p:0015 s:0062 b:0062 l:000338 d:000338 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/rack/log_tailer.rb:14
c:0010 p:0015 s:0057 b:0057 l:0016e0 d:0016e0 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/content_length.rb:13
c:0009 p:0015 s:0049 b:0049 l:0001c8 d:0001c8 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/chunked.rb:15
c:0008 p:0226 s:0042 b:0042 l:0000f0 d:0000f0 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/handler/mongrel.rb:67
c:0007 p:0017 s:0032 b:0032 l:001748 d:0008e0 BLOCK  /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:165
c:0006 p:---- s:0029 b:0029 l:000028 d:000028 FINISH
c:0005 p:---- s:0027 b:0027 l:000026 d:000026 CFUNC  :each
c:0004 p:0417 s:0024 b:0024 l:001748 d:001748 METHOD /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:164
c:0003 p:0012 s:0007 b:0007 l:000c68 d:0002a0 BLOCK  /Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:291
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP   

-- Ruby level backtrace information ----------------------------------------
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:291:in `block (2 levels) in run'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:164:in `process_client'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:164:in `each'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/mongrel-1.2.0.pre2/lib/mongrel.rb:165:in `block in process_client'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/handler/mongrel.rb:67:in `process'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/chunked.rb:15:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/content_length.rb:13:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/rack/log_tailer.rb:14:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:41:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:41:in `each'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/rack-1.2.8/lib/rack/urlmap.rb:47:in `block in call'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:77:in `method_missing'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/railties-3.0.20/lib/rails/application.rb:168:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/user_informer.rb:12:in `call'
/Users/hornairs/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/bundler/gems/airbrake-4f0417e699cd/lib/airbrake/rack.rb:42:in `call'
/Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:26:in `call'
/Users/hornairs/Sites/shopify/app/middleware/middleware_time.rb:52:in `dump_heap'
/Users/hornairs/Code/heap_dump/lib/heap_dump.rb:9:in `dump'
/Users/hornairs/Code/heap_dump/lib/heap_dump.rb:9:in `start'

who's GDB stack trace looks like this:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000102cd12e0
[Switching to process 74927 thread 0x2b03]
0x0000000101343cd0 in rb_gc_mark_machine_stack ()
(gdb) where
#0  0x0000000101343cd0 in rb_gc_mark_machine_stack ()
#1  0x0000000101452e50 in rb_thread_mark ()
#2  0x0000000101343154 in gc_mark_children ()
#3  0x0000000101452b99 in vm_mark_each_thread_func ()
#4  0x00000001013f8ddf in st_foreach ()
#5  0x000000010145445d in rb_vm_mark ()
#6  0x0000000101343154 in gc_mark_children ()
#7  0x0000000101345263 in gc_marks ()
#8  0x000000010134536d in garbage_collect ()
#9  0x0000000101345e02 in rb_gc_start ()
#10 0x000000010146ef17 in vm_call_method ()
#11 0x000000010145c4ae in vm_exec_core ()
#12 0x0000000101461a13 in vm_exec ()
#13 0x00000001014634ba in vm_call0 ()
#14 0x0000000101464e77 in rb_call0 ()
#15 0x00000001014657f4 in vm_method_missing ()
#16 0x000000010146f397 in vm_call_method ()
#17 0x000000010145c4ae in vm_exec_core ()
#18 0x0000000101461a13 in vm_exec ()
#19 0x0000000101471490 in rb_yield ()
#20 0x00000001012f336e in rb_ary_each ()
#21 0x000000010146ef17 in vm_call_method ()
#22 0x000000010145c4ae in vm_exec_core ()
#23 0x0000000101461a13 in vm_exec ()
#24 0x0000000101471490 in rb_yield ()
#25 0x00000001012f336e in rb_ary_each ()
#26 0x000000010146ef17 in vm_call_method ()
#27 0x000000010145c4ae in vm_exec_core ()
#28 0x0000000101461a13 in vm_exec ()
#29 0x0000000101462d7d in rb_vm_invoke_proc ()
#30 0x000000010147a0b9 in thread_start_func_2 ()
#31 0x000000010147a1e6 in thread_start_func_1 ()
#32 0x00007fff9353e742 in _pthread_start ()
#33 0x00007fff9352b181 in thread_start ()

Sorry for all the monstrous pastes. Any ideas? I have a pretty limited knowledge of MRI internals but it would seem like either we have a weird C extension putting bad values on the stack which heap_dump dies on, or heap_dump isn't getting the right values of stack_start and stack_end some how, and in the process messing something up such that the GC dies.

Can I give you any more information? Thanks for any pointers you can give us!

Vasfed commented 11 years ago

Thanks for informative description, pastes are ok.

Weird thing is dying of GC itself, i'll try to reproduce this error, probably this has something to do with mongrel and some signal handling etc. As a workaround you can try forking process before dumping, something like

fork { HeapDump.dump; exit }

so that at least original worker is intact even if dump fails.

Vasfed commented 11 years ago

Yep, that's mongrel messing up the stack somehow, reproduced similar fault with this minimal rack-app:

#!/usr/bin/env ruby

require 'rubygems'
require 'bundler/setup'
require 'rack'
require 'heap_dump'

class App
  def call(env)
    HeapDump.dump
    [200, {"Content-Type" => "text/html"}, ["Some reply"]]
  end
end

Rack::Handler::Mongrel.run App.new, :Port => 9292

Gemfile:

source :rubygems
gem 'rack'
gem 'mongrel', "1.2.0.pre2"
gem 'heap_dump'

Real browser is not necessary, curl localhost:9292 is sufficient to trigger a boom :)

Forking helps. Replacing mongrel with thin(probably a better idea) also helps. But i'm looking for exact cause

airhorns commented 11 years ago

@Vasfed awesome, switching to WEBrick locally on my hacked up version of heap dump allows the dump to succeed! We still see the original segfault on WEBrick as well as Unicorn however, is that expected?

airhorns commented 11 years ago

And by real browser you are right what I was trying to say is that things like integration tests which just use the rack app in ruby land work just fine, which as you saw points a big fat finger at the webserver :) Thanks so much for your help!

Vasfed commented 11 years ago

@hornairs you're welcome. It's not supposed to segfault, current guess is that this may be something related to threads, as unlike original GC heapdump does not lock everything up during dump I will look into it tomorrow

Vasfed / heap_dump

Seg fault in stack analysis phase of dump (1.9.3-p0, 1.9.3-p194) #8