Open bbbco opened 11 years ago
Can you make a repo that we can clone then run some command and get it to segfault?
Its a complicated script. I'll try to debug it to try to figure out how it is breaking and see what I can do to make a reproducible test.
Yes, I fully understand that. I have months-old bugs in some projects that reliably reproduce some bug, but I haven't had time to pare them down, so I haven't filed an upstream Issue. Let's not let this one slip away: Ruby shouldn't segfault.
@banister informs me that these things can be very Ruby-patchlevel sensitive.
Can you try it on something other than p385?
Ok, here is what I have so far.
I can make it seg fault consistently. I have figured out which line its barfing at, and realized that my code is incorrect (I was essentially saying if possibly_undefined_method
instead of if respond_to?("possibly_undefined_method")
). Fixing the line causes the script to no longer seg fault. Also, this line is executed within three ruby blocks.
I went ahead and copied the bad script out so I could continue to test this seg fault issue. I went ahead and added a binding.pry
the line before the bad line, and tried to execute the line from pry. Instead of seg faulting I got the expected error:
NoMethodError: undefined method `cr_subnav' for #<ReverbNation::Pages::ControlRoomPage:0x00000005ff97b8>
from /home/bgoad/qa/selenium/rb-restructure/tests/helpers/controlroom_helper2.rb:48:in `block (3 levels) in iterate_over_all_menu_sections'
I went ahead and let pry continue to execute the script on its own, and it errored correctly again (without seg faulting).
Next, I commented out the binding.pry
and ran the script again. Once it got to the bad line, it seg faulted.
So then I tried something bizzarre. I just replaced binding.pry
with a simple puts
command in the line immediately before the bad line. After running the script again, it did not seg fault. Instead it popped me into pry with the expected NoMethodError.
Next, I'll try your suggestion @rking and use a different Ruby version and let you know what I find out.
Pry-rescue calls Binding.callers
(from the binding_of_caller
gem) whenever an exception is raised. Binding.callers
is really buggy on -p385 (https://github.com/banister/binding_of_caller/issues/14), -p374 was ok, so if downgrading is an option temporarily that would be good.
I'll continue seeing if I can debug the binding_of_caller
bug, but it's non-trivial.
Still seg faults and core dumps using ruby-1.9.3-p374
Ok, good to know :/
Does not seg fault for ruby-1.9.3-p125
Blerg. The other way to stop segfaults is GC.disable
, but obviously that's not scalable... @banister do you have any ideas?
I previously "fixed" a segfault by allowing a small memory leak: https://github.com/banister/binding_of_caller/blob/master/ext/binding_of_caller/binding_of_caller.c#L59
You might get away with something similar, not sure.
When I last was looking the segfault was actually happening inside rb_vm_make_env_object
. I presume it was trying to reference something that had previously been deallocated, but no idea what.
Ok, I've continued to have Ruby SegFaults when running my scripts and my code tries to reference an object that does not exist (my bad coding!), pry-rescue tries to REPL, but segfaults because it can't attach to local variables. Also, I have largely seen this happening when running my scripts with the headless browser gem. Each time, its the same type of seg fault back trace.
Unfortunately, I have not yet been able to reproduce a simple test case without my scripts' overhead to share with you all. I will keep trying and hopefully be able to come up with something.
Good that you're stalking this down.
If you can capture this bug, that will be a good community contribution.
I get this segfault pretty consistently with a Selenium script I have written. Looks like pry seems to puke when it tries to render a local variable (according to examining the code in the first lines it complains about.