Shopify / yjit-bench

Set of benchmarks for the YJIT CRuby JIT compiler and other Ruby implementations.
MIT License
87 stars 22 forks source link

Consider Pure Ruby zlib for benchmarks? #273

Closed tenderlove closed 6 months ago

tenderlove commented 10 months ago

Here's a link to the project. I think this might make a good target for binary file manipulation benchmarks.

I'll work on writing a benchmark and send a PR.

maximecb commented 10 months ago

This does look promising. I'm curious to see how we perform and take a look at the stats!

eregon commented 10 months ago

IIRC we tried pr-zlib in TruffleRuby a long time ago. It was a lot slower than the C extension (IIRC).

https://github.com/djberg96/pr-zlib/blob/main/lib/pr/rbzlib.rb looks like a fairly direct translation from C to Ruby, so I guess it's obvious but this is not typical Ruby code. It's probably not particularly optimized either. I think that code is not really representative of Ruby code in general and optimizing it probably has little effect on production/real-world Ruby code (because it's unlikely to be used instead of the C extension). It's just my opinion, please do as you wish.

It might be quite interesting to run it, compare and get some stats though.

maximecb commented 10 months ago

Fair enough. The fact that the gem has so few downloads also makes it hard to justify specifically optimizing this code.

@eregon if you can think of other Pure-Ruby gems that would make nice benchmarks, we're open to suggestions.

I've also been meaning to ask you: in terms of binary file I/O, reading/writing different integer types, what is your preferred method to do that in Ruby, and is this something you've optimized in TruffleRuby?

eregon commented 10 months ago

I've also been meaning to ask you: in terms of binary file I/O, reading/writing different integer types, what is your preferred method to do that in Ruby, and is this something you've optimized in TruffleRuby?

I think Array#pack & String#unpack are the typical way to deal with binary data in Ruby. These are optimized as their own mini-language in TruffleRuby, specifically we create small ASTs for them and partial evaluate them. Chris talked about this in this video. BTW TruffleRuby does the same for Kernel#sprintf, which is in that regard very similar to Array#pack (but producing textual instead of binary representation).

Looking at https://github.com/oracle/truffleruby/tree/master/bench:

eregon commented 10 months ago

hexapdf might also do quite a bit of binary data handling. https://github.com/Shopify/yjit-bench/blob/main/benchmarks/hexapdf/benchmark.rb seems to write a PDF but not read one, it might be interesting to read and/or transform a PDF too for more binary data handling.

eregon commented 10 months ago

One more thought is I think it'd probably make sense to add most/all of the classic benchmarks at https://github.com/oracle/truffleruby/tree/master/bench/classic. Many of them are already in yjit-bench. They are not really representative of typical Ruby code but I think they stress pretty fundamental things (e.g. polymorphic calls, recursion) so optimizing them is likely to affect real workloads too.

aobench is a small raytracer and it even renders it to a .ppm file, using sprintf("%c", byte) which is rather original but that's what it does (the original benchmark does printf and just outputs the image on stdout). The others are fairly well-known "classic" benchmarks like richards, deltablue and the shootout benchmarks.

There is also the AWFY benchmarks https://github.com/smarr/are-we-fast-yet and notably this branch which uses Ruby Array & Hash instead of custom data structures (and so is closer to typical Ruby code). The paper has a pretty in-depth analysis of what each benchmark does (notably Figure 3).