Shopify / yjit-bench

Set of benchmarks for the YJIT CRuby JIT compiler and other Ruby implementations.
MIT License
87 stars 22 forks source link

Add pure-Ruby string XOR microbenchmark #279

Closed maximecb closed 9 months ago

maximecb commented 9 months ago

Current performance on my MacBook M1:

interp: ruby 3.4.0dev (2024-01-24T17:40:30Z master 578ff32611) [arm64-darwin23]
yjit: ruby 3.4.0dev (2024-01-24T17:40:30Z master 578ff32611) +YJIT [arm64-darwin23]

--------  -----------  ----------  ---------  ----------  ------------  -----------
bench     interp (ms)  stddev (%)  yjit (ms)  stddev (%)  yjit 1st itr  interp/yjit
ruby-xor  482.7        0.6         118.7      0.9         4.02          4.07       
--------  -----------  ----------  ---------  ----------  ------------  -----------
Legend:
- yjit 1st itr: ratio of interp/yjit time for the first benchmarking iteration.
- interp/yjit: ratio of interp/yjit time. Higher is better for yjit. Above 1 represents a speedup.

It would be nice if we could get this to run 10x faster than the interpreter. I plan to add more specialized C function codegen. The improved register allocator will probably help a lot too.

Relevant stats:

num_send_cfunc:           26,692,908 (99.6%)
num_send_cfunc_inline:    13,595,926 (50.9%)
...
ratio_in_yjit:                 99.8%
avg_len_in_yjit:              1915.5
Top-1 most frequent exit ops (100.0% of exits):
    branchif:          1 (100.0%)
Top-9 most frequent C calls (49.1% of C calls):
             Integer#^:  6,498,050 (24.3%)
        String#setbyte:  6,498,050 (24.3%)
            String#dup:     99,971 ( 0.4%)
           Symbol#to_s:        445 ( 0.0%)
    Symbol#start_with?:        410 ( 0.0%)
          Symbol#match:         33 ( 0.0%)
    Class#alias_method:         21 ( 0.0%)
      Module#const_set:          1 ( 0.0%)
      Module#const_get:          1 ( 0.0%)

So we're basically just missing xor support and setbyte to inline 100% of the calls in this. I plan to look at these in the coming days :)

maximecb commented 9 months ago

@jhawthorn 🤠

eregon commented 9 months ago

FWIW I was curious about the performance of this benchmark on TruffleRuby, here is what I got locally:

$ ruby -Iharness-warmup benchmarks/ruby-xor.rb
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
iter # 12: 594ms, mad=0.0010/0.0012, median=594ms

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [x86_64-linux]
iter # 27: 188ms, mad=0.0004/0.0027, median=188ms

truffleruby 23.1.1, like ruby 3.2.2, Oracle GraalVM JVM [x86_64-linux]
iter # 90: 52ms, mad=0.0024/0.0090, median=52ms