Shopify / liquid-c

Liquid performance extension in C.
MIT License
119 stars 24 forks source link

Add a timeout in CI and enable verbose tests for truffleruby #217

Closed eregon closed 6 months ago

eregon commented 6 months ago
eregon commented 6 months ago

So https://github.com/Shopify/liquid-c/actions/runs/8235861511/job/22520963167?pr=217 failed after:

...
TokenizerTest#test_tokenize_for_liquid_tag = 0.00 s = .
TokenizerTest#test_non_utf8_compatible_source = 0.00 s = .
Error: The operation was canceled.

Unfortunately we don't see the started & hanging test due to I guess some flushing issue of GitHub Actions on cancellation (TruffleRuby does not buffer writes so that cannot be the issue).

eregon commented 6 months ago

Running locally on truffleruby-dev, the test that hangs seems to be various tests of VariableTest like:

VariableTest#test_write_fixnum =
VariableTest#test_filter_with_variable_arg_after_const_arg =
VariableTest#test_filter_with_const_arg =
eregon commented 6 months ago

I have tried TEST_OPTS="-v -s0" bundle exec ruby -Ilib -Itest test/unit/variable_test.rb and that reproduces, to get a stacktrace. But unfortunately that shows a bunch of threads because apparently minitest/parallel.rb gets used. Found MT_CPU=0/MT_CPU=1 to disable that. Got a more useful stacktrace now, will take a look later.

eregon commented 6 months ago

Attaching with gdb --pid $PID I see:

86          } while (new_size > capacity);
(gdb) bt
#0  write_fixnum (output=841328705388478728, fixnum=5) at ../../../../ext/liquid_c/liquid_vm.c:86
#1  0x00007f061cc2396e in write_obj (output=841328705388478728, obj=5) at ../../../../ext/liquid_c/liquid_vm.c:115
#2  0x00007f061cc239e5 in write_obj (output=841328705388478728, obj=841328705388478808) at ../../../../ext/liquid_c/liquid_vm.c:127
#3  0x00007f061cc2477a in vm_render_until_error (uncast_args=140734667024768) at ../../../../ext/liquid_c/liquid_vm.c:426
#4  0x00007f061f018f52 in rb_tr_setjmp_wrapper_pointer1_to_pointer (func=0x7f061cc23ed3 <vm_render_until_error>, arg1=140734667024768) at trampoline.c:225
#5  0x00007f062f777056 in ffi_call_unix64 () at ../libffi-3.4.4/src/x86/unix64.S:104
#6  0x00007f062f7758ec in ffi_call_int (cif=cif@entry=0x55e5558ef160, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>, closure=closure@entry=0x0) at ../libffi-3.4.4/src/x86/ffi64.c:673
#7  0x00007f062f776463 in ffi_call (cif=0x55e5558ef160, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>) at ../libffi-3.4.4/src/x86/ffi64.c:710
#8  0x00007f0629865235 in ?? () from /home/eregon/.rubies/truffleruby-dev/bin/../lib/librubyvm.so
#9  0x00000000558e1f00 in ?? ()
#10 0x000055e5558ef160 in ?? ()
#11 0x00007f061f018e96 in rb_tr_setjmp_wrapper_pointer2_int_pointer2_to_pointer (func=0x7b431aaf8, yielded_arg=139664266727062, callback_arg=139662636253920, argc=0, argv=0x7f05b431ab40, blockarg=139662343549288) at trampoline.c:215
#12 0x00007f05bdd28360 in ?? ()
#13 0x000055e55591d4c0 in ?? ()
#14 0x00007f0638590960 in ?? () from /lib64/libc.so.6
#15 0x000055e5558e1fd8 in ?? ()
#16 0x00007f0629865e3b in ?? () from /home/eregon/.rubies/truffleruby-dev/bin/../lib/librubyvm.so

So it sounds like an unbounded loop while in the C extension.

(gdb) p new_size 
$1 = 1
(gdb) p capacity
$2 = 0
eregon commented 6 months ago

My guess is https://github.com/oracle/truffleruby/commit/b7cb26e092e61f52c17e137ccaf64136c7ce7e5c is what broke this. OTOH it should never have worked in the first place. The question is whether rb_str_capacity(str) can return 0 or not. If the String in question is e.g. "" allocated in Ruby then its capacity is truly 0 from truffleruby's POV, one will need to reallocate to append/write anything to it.

peterzhu2118 commented 6 months ago

The question is whether rb_str_capacity(str) can return 0 or not.

I agree, this is a bug in liquid-c. This should be pretty simple to fix, can you send a PR?

eregon commented 6 months ago

Confirmed that

  describe "rb_str_capacity" do
    it "returns 0 for empty String" do
      @s.rb_str_capacity("").should == 0
    end

    it "returns 1 for 'a'" do
      @s.rb_str_capacity("a").should == 1
    end
  end

passes on truffleruby but "fails" on CRuby which returns 15 for both cases (due to embedded String).

eregon commented 6 months ago

@peterzhu2118 I added a commit to this PR, could you review it? I think it's good to include the timeout and verbose tests anyway.

peterzhu2118 commented 6 months ago

Thank you for debugging and the fix!