SpringMT / zstd-ruby

Ruby binding for zstd(Zstandard - Fast real-time compression algorithm)
https://github.com/facebook/zstd
BSD 3-Clause "New" or "Revised" License
69 stars 16 forks source link

Feature/unlock gvl for streaming compression/decompression #79

Closed SpringMT closed 7 months ago

SpringMT commented 7 months ago

Zstd streaming compression/decompression is a CPU-only process that do not involve the ruby interpreter. Therefore, it is possible to unlock the GVL, allowing for parallel multiple threads, thus fully utilizing CPU resources.

This program demonstrates the difference: (in benckmarks)

Re-introduce https://github.com/SpringMT/zstd-ruby/pull/53

Streaming Compression

$LOAD_PATH.unshift '../lib'
require 'zstd-ruby'
require 'thread'

GUESSES = (ENV['GUESSES'] || 1000).to_i
THREADS = (ENV['THREADS'] || 1).to_i

p GUESSES: GUESSES, THREADS: THREADS

sample_file_name = ARGV[0]
json_string = File.read("./samples/#{sample_file_name}")

queue = Queue.new
GUESSES.times { queue << json_string }
THREADS.times { queue << nil }
THREADS.times.map {
  Thread.new {
    while str = queue.pop
      stream = Zstd::StreamingCompress.new
      stream << str
      res = stream.flush
      stream << str
      res << stream.finish
    end
  }
}.each(&:join)

Without this patch:

[springmt@MacBook-Pro] (main)✗ % time THREADS=4 bundle exec ruby multi_thread_streaming_comporess.rb city.json
{:GUESSES=>1000, :THREADS=>4}
THREADS=4 bundle exec ruby multi_thread_streaming_comporess.rb city.json  2.83s user 0.29s system 94% cpu 3.299 total

With the patch:

[springmt@MacBook-Pro] (feature/unlock-gvl)✗ % time THREADS=4 bundle exec ruby multi_thread_streaming_comporess.rb city.json
{:GUESSES=>1000, :THREADS=>4}
THREADS=4 bundle exec ruby multi_thread_streaming_comporess.rb city.json  3.33s user 0.36s system 266% cpu 1.385 total

Streaming Decompression

$LOAD_PATH.unshift '../lib'
require 'zstd-ruby'
require 'thread'

GUESSES = (ENV['GUESSES'] || 1000).to_i
THREADS = (ENV['THREADS'] || 1).to_i

p GUESSES: GUESSES, THREADS: THREADS

sample_file_name = ARGV[0]
json_string = File.read("./samples/#{sample_file_name}")
target = Zstd.compress(json_string)

queue = Queue.new
GUESSES.times { queue << target }
THREADS.times { queue << nil }
THREADS.times.map {
  Thread.new {
    while str = queue.pop
      stream = Zstd::StreamingDecompress.new
      stream.decompress(str)
      stream.decompress(str)
    end
  }
}.each(&:join)

Without this patch:

[springmt@MacBook-Pro] (main)✗ % time THREADS=4 bundle exec ruby multi_thread_streaming_decomporess.rb city.json
{:GUESSES=>1000, :THREADS=>4}
THREADS=4 bundle exec ruby multi_thread_streaming_decomporess.rb city.json  2.03s user 0.28s system 93% cpu 2.486 total

With the patch:

[springmt@MacBook-Pro] (feature/unlock-gvl)✗ % time THREADS=4 bundle exec ruby multi_thread_streaming_decomporess.rb city.json
{:GUESSES=>1000, :THREADS=>4}
THREADS=4 bundle exec ruby multi_thread_streaming_decomporess.rb city.json  2.49s user 0.49s system 227% cpu 1.310 total