Consider Pure Ruby zlib for benchmarks?

tenderlove commented 10 months ago

Here's a link to the project. I think this might make a good target for binary file manipulation benchmarks.

I'll work on writing a benchmark and send a PR.

maximecb commented 10 months ago

This does look promising. I'm curious to see how we perform and take a look at the stats!

eregon commented 10 months ago

IIRC we tried pr-zlib in TruffleRuby a long time ago. It was a lot slower than the C extension (IIRC).

https://github.com/djberg96/pr-zlib/blob/main/lib/pr/rbzlib.rb looks like a fairly direct translation from C to Ruby, so I guess it's obvious but this is not typical Ruby code. It's probably not particularly optimized either. I think that code is not really representative of Ruby code in general and optimizing it probably has little effect on production/real-world Ruby code (because it's unlikely to be used instead of the C extension). It's just my opinion, please do as you wish.

It might be quite interesting to run it, compare and get some stats though.

maximecb commented 10 months ago

Fair enough. The fact that the gem has so few downloads also makes it hard to justify specifically optimizing this code.

@eregon if you can think of other Pure-Ruby gems that would make nice benchmarks, we're open to suggestions.

I've also been meaning to ask you: in terms of binary file I/O, reading/writing different integer types, what is your preferred method to do that in Ruby, and is this something you've optimized in TruffleRuby?

eregon commented 10 months ago

I've also been meaning to ask you: in terms of binary file I/O, reading/writing different integer types, what is your preferred method to do that in Ruby, and is this something you've optimized in TruffleRuby?

I think Array#pack & String#unpack are the typical way to deal with binary data in Ruby. These are optimized as their own mini-language in TruffleRuby, specifically we create small ASTs for them and partial evaluate them. Chris talked about this in this video. BTW TruffleRuby does the same for Kernel#sprintf, which is in that regard very similar to Array#pack (but producing textual instead of binary representation).

Looking at https://github.com/oracle/truffleruby/tree/master/bench:

chunky_png uses quite a few pack&unpack. There is already https://github.com/Shopify/yjit-bench/blob/main/benchmarks/chunky-png/benchmark.rb but I'm not sure if that uses pack much or not, would be interesting to check. This example using chunky_png is using setbyte heavily from this profile. It'd be good to see if these setbyte are also part of the yjit-bench chunky_png benchmark.
image-demo is a benchmark ported from a PyPy benchmark, it uses some .pack('C*'). It's quite interesting how much a JIT can optimize it, but probably not very representative of real Ruby code.
optcarrot uses a few pack & unpack as well, but only if using actual audio or video. The SDL2 video driver uses FFI::Pointer#write_array_of_uint32 which seems also related to binary data (but that's probably implemented using a native extension for CRuby).
psd.rb does some pack & unpack, not sure how perf-sensitive it is but I'd think psd.rb is very related to binary data handling and would be a good benchmark addition.
rack seems to do some unpack("m*") and unpack("C*")

From https://github.com/Shopify/yjit-bench/pull/269#issuecomment-1898345106, there is a raytracer benchmark at https://github.com/edin/raytracer which uses String#[]= a lot

$ git grep -P '\b(un)?pack\b' bench/
bench/chunky_png/chunky-decode-png-image-pass.rb:56:pixel = [PIXEL].pack("N")
bench/chunky_png/chunky-decode-png-image-pass.rb:57:scan_line = [ChunkyPNG::FILTER_NONE].pack("c") + (pixel * WIDTH)
bench/chunky_png/chunky_png/lib/chunky_png/canvas/data_url_exporting.rb:11:        ['data:image/png;base64,', to_blob].pack('A*m').gsub(/\n/, '')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/data_url_importing.rb:14:          from_blob($1.unpack('m').first)
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_decoding.rb:255:        stream.unpack("@#{pos + 1}N#{width}")
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_decoding.rb:263:        stream.unpack("@#{pos + 1}n#{width * 4}").each_slice(4) do |r, g, b, a|
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_decoding.rb:274:        stream.unpack("@#{pos + 1}" << ('NX' * width)).map { |c| c | 0x000000ff }
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_decoding.rb:282:        stream.unpack("@#{pos + 1}n#{width * 3}").each_slice(3) do |r, g, b|
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_decoding.rb:300:        stream.unpack("@#{pos + 1}n#{width * 2}").each_slice(2) do |g, a|
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_decoding.rb:347:        values = stream.unpack("@#{pos + 1}n#{width}")
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:237:        pixels.pack('x' + ('NX' * width))
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:244:        pixels.pack("xN#{width}")
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:262:        chars.pack('xC*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:276:        chars.pack('xC*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:287:        chars.pack('xC*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:294:        pixels.map { |p| encoding_palette.index(p) }.pack("xC#{width}")
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:312:        chars.pack('xC*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:326:        chars.pack('xC*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:337:        chars.pack('xC*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:344:        pixels.map { |p| p >> 8 }.pack("xC#{width}")
bench/chunky_png/chunky_png/lib/chunky_png/canvas/png_encoding.rb:351:        pixels.pack("xn#{width}")
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_exporting.rb:15:        pixels.pack('N*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_exporting.rb:26:        pixels.pack('NX' * pixels.length)
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_exporting.rb:33:        pixels.pack('C*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_exporting.rb:43:        pixels.pack('nX' * pixels.length)
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_exporting.rb:54:        pixels.pack('V*')
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_importing.rb:22:        pixels = string.unpack(unpacker).map { |color| color | 0x000000ff }
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_importing.rb:39:        self.new(width, height, string.unpack("N*"))
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_importing.rb:56:        pixels = string.unpack("@1" << ('XV' * (width * height))).map { |color| color | 0x000000ff }
bench/chunky_png/chunky_png/lib/chunky_png/canvas/stream_importing.rb:73:        self.new(width, height, string.unpack("V*"))
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:22:      length, type = io.read(8).unpack('Na4')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:24:      crc          = io.read(4).unpack('N').first
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:69:        io << [content.length].pack('N') << type << content
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:70:        io << [Zlib.crc32(content, Zlib.crc32(type))].pack('N')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:134:        fields = content.unpack('NNC5')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:143:        [width, height, depth, color, compression, filtering, interlace].pack('NNC5')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:198:        content.unpack('C*')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:207:        values = content.unpack('nnn').map { |c| ChunkyPNG::Canvas.send(:"decode_png_resample_#{bit_depth}bit_value", c) }
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:217:        value = ChunkyPNG::Canvas.send(:"decode_png_resample_#{bit_depth}bit_value", content.unpack('n')[0])
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:258:        keyword, value = content.unpack('Z*a*')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:267:        [keyword, value].pack('Z*a*')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:287:        keyword, compression, value = content.unpack('Z*Ca*')
bench/chunky_png/chunky_png/lib/chunky_png/chunk.rb:297:        [keyword, ChunkyPNG::COMPRESSION_DEFAULT, Zlib::Deflate.deflate(value)].pack('Z*Ca*')
bench/chunky_png/chunky_png/lib/chunky_png/color.rb:141:      rgb(*stream.unpack("@#{pos}C3"))
bench/chunky_png/chunky_png/lib/chunky_png/color.rb:151:      rgba(*stream.unpack("@#{pos}C4"))
bench/chunky_png/chunky_png/lib/chunky_png/datastream.rb:14:    SIGNATURE = ChunkyPNG.force_binary([137, 80, 78, 71, 13, 10, 26, 10].pack('C8'))
bench/chunky_png/chunky_png/lib/chunky_png/palette.rb:42:      palatte_bytes = palette_chunk.content.unpack('C*')
bench/chunky_png/chunky_png/lib/chunky_png/palette.rb:44:        alpha_channel = transparency_chunk.content.unpack('C*')
bench/chunky_png/chunky_png/lib/chunky_png/palette.rb:151:      ChunkyPNG::Chunk::Transparency.new('tRNS', map { |c| ChunkyPNG::Color.a(c) }.pack('C*'))
bench/chunky_png/chunky_png/lib/chunky_png/palette.rb:172:      ChunkyPNG::Chunk::Palette.new('PLTE', colors.pack('C*'))
bench/chunky_png/chunky_png/lib/chunky_png/rmagick.rb:39:      image.import_pixels(0,0, canvas.width, canvas.height, 'RGBA', canvas.pixels.pack('N*'))
bench/image-demo/lib/io.rb:26:      #@color_data = Array.new(img.width * img.height / 2, 127).pack("C*")
bench/image-demo/lib/noborder.rb:66:    f.write @data.pack('C*')
bench/image-demo/lib/noborder.rb:97:    f.write @data[(@width+1)...(-@width-1)].pack('C*')
bench/micro/array/pack.rb:11:benchmark 'core-array-pack-little-C' do
bench/micro/array/pack.rb:12:  little_array_of_bytes.pack('C*')
bench/micro/array/pack.rb:17:benchmark 'core-array-pack-big-C' do
bench/micro/array/pack.rb:18:  big_array_of_bytes.pack('C*')
bench/micro/string/equal.rb:13:  Array.new(n, 'a'.ord).pack('C*')
bench/micro/string/index.rb:13:  Array.new(n, 'a'.ord).pack('C*')
bench/optcarrot/lib/optcarrot/driver/ao_audio.rb:58:      buff = output.pack(@pack_format)
bench/optcarrot/lib/optcarrot/driver/gif_video.rb:15:      @f << header.pack("A*vvC*")
bench/optcarrot/lib/optcarrot/driver/gif_video.rb:18:      @f << [0x21, 0xff, 0x0b, "NETSCAPE", "2.0", 0x03, 0x01, 0x00, 0x00].pack("C3A8A3CCvC")
bench/optcarrot/lib/optcarrot/driver/gif_video.rb:21:      @header = [0x21, 0xf9, 0x04, 0x00, 1, 255, 0x00].pack("C4vCC")
bench/optcarrot/lib/optcarrot/driver/gif_video.rb:22:      @header << [0x2c, 0, 0, WIDTH, HEIGHT, 0, 8].pack("Cv4C*")
bench/optcarrot/lib/optcarrot/driver/gif_video.rb:27:      @f << [0x3b].pack("C")
bench/optcarrot/lib/optcarrot/driver/gif_video.rb:64:      buff = [buff].pack("b*")
bench/optcarrot/lib/optcarrot/driver/gif_video.rb:66:      buff = buff.gsub(/.{1,255}/m) { [$&.size].pack("C") + $& } + [0].pack("C")
bench/optcarrot/lib/optcarrot/driver/misc.rb:108:      return width, height, pixels.write_bytes(dat.pack("V*"))
bench/optcarrot/lib/optcarrot/driver/png_video.rb:39:          chunk("IHDR", [@width, @height, 8, 2, 0, 0, 0].pack("NNCCCCC")),
bench/optcarrot/lib/optcarrot/driver/png_video.rb:46:        [data.bytesize, type, data, crc32(type + data)].pack("NA4A*N")
bench/optcarrot/lib/optcarrot/driver/png_video.rb:54:        code = [0x78, 0x9c].pack("C2") # Zlib header (RFC 1950)
bench/optcarrot/lib/optcarrot/driver/png_video.rb:58:          code << [data.empty? ? 1 : 0, s.size, ~s.size, *s].pack("CvvC*")
bench/optcarrot/lib/optcarrot/driver/png_video.rb:60:        code << [b % ADLER_MOD, a % ADLER_MOD].pack("nn") # Adler-32 (RFC 1950)
bench/optcarrot/lib/optcarrot/driver/sdl2.rb:42:      case pixels.read_bytes(4).unpack("C*")
bench/optcarrot/lib/optcarrot/driver/sdl2.rb:122:      case pixels.read_bytes(2).unpack("C*")
bench/optcarrot/lib/optcarrot/driver/sdl2_audio.rb:38:      buff = output.pack(@pack_format)
bench/optcarrot/lib/optcarrot/driver/sfml_audio.rb:24:      @buff << output.pack("v*".freeze)
bench/optcarrot/lib/optcarrot/driver/wav_audio.rb:9:      buff = @buff.pack(@pack_format)
bench/optcarrot/lib/optcarrot/driver/wav_audio.rb:13:      ].pack("A4VA4A4VvvVVvvA4VA*")
bench/optcarrot/lib/optcarrot/nes.rb:61:      puts "checksum: #{ @ppu.output_pixels.pack("C*").sum }" if @conf.print_video_checksum && @video.class == Video
bench/optcarrot/lib/optcarrot/rom.rb:16:        sig, _, flags, comp, _, _, _, data_len, _, fn_len, ext_len = bin.slice!(0, 30).unpack("a4v5V3v2")
bench/optcarrot/lib/optcarrot/rom.rb:140:      File.binwrite(sav, @wrk.pack("C*"))
bench/optcarrot/tools/shim.rb:77:unless [].respond_to?(:pack) && [33, 33].pack("C*") == "!!"
bench/optcarrot/tools/shim.rb:78:  $stderr.puts "[shim] Array#pack"
bench/optcarrot/tools/shim.rb:80:    alias pack_orig pack if [].respond_to?(:pack)
bench/optcarrot/tools/shim.rb:81:    def pack(fmt)
bench/optcarrot/tools/shim.rb:146:  if "".respond_to?(:unpack)
bench/optcarrot/tools/shim.rb:147:    $stderr.puts "[shim] String#bytes (by using unpack)"
bench/optcarrot/tools/shim.rb:151:        unpack("C*")
bench/optcarrot/tools/shim.rb:190:    def unpack(fmt)
bench/psd.rb/psd.rb/lib/psd/file.rb:5:    # All of the formats and their pack codes that we want to be able to convert into
bench/psd.rb/psd.rb/lib/psd/file.rb:44:        read(info[:length]).unpack(info[:code])[0]
bench/psd.rb/psd.rb/lib/psd/file.rb:48:        write [val].pack(info[:code])
bench/psd.rb/psd.rb/lib/psd/file.rb:55:      read(1).unpack('c*')[0].to_f +
bench/psd.rb/psd.rb/lib/psd/file.rb:56:        (read(3).unpack('B*')[0].to_i(2).to_f / (2 ** 24)).to_f # pre-decimal point
bench/psd.rb/psd.rb/lib/psd/file.rb:60:      write [num.to_i].pack('C')
bench/psd.rb/psd.rb/lib/psd/file.rb:67:      write [binary_numerator >> 16].pack('C')
bench/psd.rb/psd.rb/lib/psd/file.rb:68:      write [binary_numerator >> 8].pack('C')
bench/psd.rb/psd.rb/lib/psd/file.rb:69:      write [binary_numerator >> 0].pack('C')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:46:        outfile.write @blending_ranges[:grey][:source][:black].pack('CC')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:47:        outfile.write @blending_ranges[:grey][:source][:white].pack('CC')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:48:        outfile.write @blending_ranges[:grey][:dest][:black].pack('CC')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:49:        outfile.write @blending_ranges[:grey][:dest][:white].pack('CC')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:52:          outfile.write @blending_ranges[:channels][i][:source][:black].pack('CC')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:53:          outfile.write @blending_ranges[:channels][i][:source][:white].pack('CC')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:54:          outfile.write @blending_ranges[:channels][i][:dest][:black].pack('CC')
bench/psd.rb/psd.rb/lib/psd/layer/blending_ranges.rb:55:          outfile.write @blending_ranges[:channels][i][:dest][:white].pack('CC')
bench/sinatra/bundle/gems/rack-2.2.3/lib/rack/lobster.rb:11:    I8jyiTlhTcYXkekJAzTyYN6E08A+dk8voBkAVTJQ==".delete("\n ").unpack("m*")[0])
bench/sinatra/bundle/gems/rack-2.2.3/lib/rack/utils.rb:380:      l = a.unpack("C*")

eregon commented 10 months ago

hexapdf might also do quite a bit of binary data handling. https://github.com/Shopify/yjit-bench/blob/main/benchmarks/hexapdf/benchmark.rb seems to write a PDF but not read one, it might be interesting to read and/or transform a PDF too for more binary data handling.

eregon commented 10 months ago

One more thought is I think it'd probably make sense to add most/all of the classic benchmarks at https://github.com/oracle/truffleruby/tree/master/bench/classic. Many of them are already in yjit-bench. They are not really representative of typical Ruby code but I think they stress pretty fundamental things (e.g. polymorphic calls, recursion) so optimizing them is likely to affect real workloads too.

aobench is a small raytracer and it even renders it to a .ppm file, using sprintf("%c", byte) which is rather original but that's what it does (the original benchmark does printf and just outputs the image on stdout). The others are fairly well-known "classic" benchmarks like richards, deltablue and the shootout benchmarks.

There is also the AWFY benchmarks https://github.com/smarr/are-we-fast-yet and notably this branch which uses Ruby Array & Hash instead of custom data structures (and so is closer to typical Ruby code). The paper has a pretty in-depth analysis of what each benchmark does (notably Figure 3).

Shopify / yjit-bench

Consider Pure Ruby zlib for benchmarks? #273