digital-fabric / polyphony

Fine-grained concurrency for Ruby
https://www.rubydoc.info/gems/polyphony
MIT License
659 stars 17 forks source link

Mutex issue with asynchronous interrupts #32

Closed wjordan closed 4 years ago

wjordan commented 4 years ago

There's a common issue with Ruby's ensure blocks where resources can leak (fail to clean up as expected) where an 'asynchronous interrupt' may cause a block to exit before executing critical cleanup code.

See Charles Nutter's (2008) post for a description of the issue as it relates to the use of Thread#raise / Thread#kill. This has even caused subtle bugs in Ruby's own Mutex / ConditionVariable implementations (see e.g. 14999.

It seems that Fiber switchpoints can trigger the same kinds of issues. Here is a small script that breaks Polyphony::Mutex along these lines (based on xx-using-a-mutex.rb):

require 'bundler/setup'
require 'polyphony'
require 'polyphony/core/sync'

def loop_it(number, lock)
  loop do
    sleep(rand * 0.2)
    lock.synchronize do
      raise "[#{number}] still locked by #{$locked}" if $locked
      puts "child #{number} has the lock"
      $locked = number
      sleep(rand * 0.05)
      $locked = nil
    end
  end
end

lock = Polyphony::Mutex.new
100.times do |i|
  Array.new(3) do |n|
    spin { loop_it(i*3 + n, lock) }
  end.tap{sleep 0.1}.each_with_index do |f, n|
    puts "Terminating #{i*3+n}"
    f.terminate
  end
end

Here's a sample run:

$ bundle exec ruby examples/core/xx-using-a-mutex.rb 
child 2 has the lock
child 0 has the lock
Terminating 0
Terminating 1
Terminating 2
child 3 has the lock
child 5 has the lock
Terminating 3
Terminating 4
Terminating 5
Traceback (most recent call last):
        10: from examples/core/xx-using-a-mutex.rb:21:in `<main>'
         9: from examples/core/xx-using-a-mutex.rb:21:in `times'
         8: from examples/core/xx-using-a-mutex.rb:22:in `block in <main>'
         7: from examples/core/xx-using-a-mutex.rb:22:in `new'
         6: from examples/core/xx-using-a-mutex.rb:22:in `initialize'
         5: from examples/core/xx-using-a-mutex.rb:23:in `block (2 levels) in <main>'
         4: from examples/core/xx-using-a-mutex.rb:23:in `block (3 levels) in <main>'
         3: from examples/core/xx-using-a-mutex.rb:8:in `loop_it'
         2: from examples/core/xx-using-a-mutex.rb:8:in `loop'
         1: from examples/core/xx-using-a-mutex.rb:10:in `block in loop_it'
examples/core/xx-using-a-mutex.rb:11:in `block (2 levels) in loop_it': [6] still locked by 5 (RuntimeError)

I wanted to raise this issue early since I think a good general solution to this problem in all places where ensure blocks are being used for resource cleanup will be crucial to the overall stability of this framework.

noteflakes commented 4 years ago

Thanks so much @wjordan for reporting this, it was a one-line fix!

Please note that your example also has a bug. You need to reset $lock inside of an ensure block, because sleep might be interrupted with an exception:

def loop_it(number, lock)
  puts "child #{number} starting..."
  loop do
    sleep(rand * 0.2)
    lock.synchronize do
      raise "[#{number}] still locked by #{$locked}" if $locked
      puts "#{number} has the lock"
      begin
        $locked = number
        sleep(rand * 0.05)
      ensure
        $locked = nil
      end
    end
  end
end

lock = Polyphony::Mutex.new
100.times do |i|
  Array.new(3) do |n|
    spin { loop_it(i*3 + n, lock) }
  end.tap{sleep 0.1}.each_with_index do |f, n|
    puts "Terminating #{i*3+n}"
    f.terminate
  end
end

In general, the problem with the stock Ruby Thread#raise , Timeout::timeout etc APIs is that MRI implements them IIRC by using setjmp/longjmp, which can pull the rug from under your feet at any moment, even inside of a C function. Note, however, that your example does not involve multiple threads, and does not use those APIs. At any case Polyphony reimplements Thread#raise, and Timeout::timeout to use the scheduler to raise exceptions on arbitrary fibers. Those patches provided by Polyphony remove the problem completely, assuming there are no bugs 😄 . Good catch!