Closed eregon closed 6 months ago
So https://github.com/Shopify/liquid-c/actions/runs/8235861511/job/22520963167?pr=217 failed after:
...
TokenizerTest#test_tokenize_for_liquid_tag = 0.00 s = .
TokenizerTest#test_non_utf8_compatible_source = 0.00 s = .
Error: The operation was canceled.
Unfortunately we don't see the started & hanging test due to I guess some flushing issue of GitHub Actions on cancellation (TruffleRuby does not buffer writes so that cannot be the issue).
Running locally on truffleruby-dev, the test that hangs seems to be various tests of VariableTest
like:
VariableTest#test_write_fixnum =
VariableTest#test_filter_with_variable_arg_after_const_arg =
VariableTest#test_filter_with_const_arg =
I have tried TEST_OPTS="-v -s0" bundle exec ruby -Ilib -Itest test/unit/variable_test.rb
and that reproduces, to get a stacktrace.
But unfortunately that shows a bunch of threads because apparently minitest/parallel.rb
gets used. Found MT_CPU=0
/MT_CPU=1
to disable that.
Got a more useful stacktrace now, will take a look later.
Attaching with gdb --pid $PID
I see:
86 } while (new_size > capacity);
(gdb) bt
#0 write_fixnum (output=841328705388478728, fixnum=5) at ../../../../ext/liquid_c/liquid_vm.c:86
#1 0x00007f061cc2396e in write_obj (output=841328705388478728, obj=5) at ../../../../ext/liquid_c/liquid_vm.c:115
#2 0x00007f061cc239e5 in write_obj (output=841328705388478728, obj=841328705388478808) at ../../../../ext/liquid_c/liquid_vm.c:127
#3 0x00007f061cc2477a in vm_render_until_error (uncast_args=140734667024768) at ../../../../ext/liquid_c/liquid_vm.c:426
#4 0x00007f061f018f52 in rb_tr_setjmp_wrapper_pointer1_to_pointer (func=0x7f061cc23ed3 <vm_render_until_error>, arg1=140734667024768) at trampoline.c:225
#5 0x00007f062f777056 in ffi_call_unix64 () at ../libffi-3.4.4/src/x86/unix64.S:104
#6 0x00007f062f7758ec in ffi_call_int (cif=cif@entry=0x55e5558ef160, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>, closure=closure@entry=0x0) at ../libffi-3.4.4/src/x86/ffi64.c:673
#7 0x00007f062f776463 in ffi_call (cif=0x55e5558ef160, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>) at ../libffi-3.4.4/src/x86/ffi64.c:710
#8 0x00007f0629865235 in ?? () from /home/eregon/.rubies/truffleruby-dev/bin/../lib/librubyvm.so
#9 0x00000000558e1f00 in ?? ()
#10 0x000055e5558ef160 in ?? ()
#11 0x00007f061f018e96 in rb_tr_setjmp_wrapper_pointer2_int_pointer2_to_pointer (func=0x7b431aaf8, yielded_arg=139664266727062, callback_arg=139662636253920, argc=0, argv=0x7f05b431ab40, blockarg=139662343549288) at trampoline.c:215
#12 0x00007f05bdd28360 in ?? ()
#13 0x000055e55591d4c0 in ?? ()
#14 0x00007f0638590960 in ?? () from /lib64/libc.so.6
#15 0x000055e5558e1fd8 in ?? ()
#16 0x00007f0629865e3b in ?? () from /home/eregon/.rubies/truffleruby-dev/bin/../lib/librubyvm.so
So it sounds like an unbounded loop while in the C extension.
(gdb) p new_size
$1 = 1
(gdb) p capacity
$2 = 0
My guess is https://github.com/oracle/truffleruby/commit/b7cb26e092e61f52c17e137ccaf64136c7ce7e5c is what broke this.
OTOH it should never have worked in the first place.
The question is whether rb_str_capacity(str)
can return 0 or not.
If the String in question is e.g. ""
allocated in Ruby then its capacity is truly 0 from truffleruby's POV, one will need to reallocate to append/write anything to it.
The question is whether rb_str_capacity(str) can return 0 or not.
I agree, this is a bug in liquid-c. This should be pretty simple to fix, can you send a PR?
Confirmed that
describe "rb_str_capacity" do
it "returns 0 for empty String" do
@s.rb_str_capacity("").should == 0
end
it "returns 1 for 'a'" do
@s.rb_str_capacity("a").should == 1
end
end
passes on truffleruby but "fails" on CRuby which returns 15
for both cases (due to embedded String).
@peterzhu2118 I added a commit to this PR, could you review it? I think it's good to include the timeout and verbose tests anyway.
Thank you for debugging and the fix!