JoshCheek / seeing_is_believing

Displays the results of every line of code in your file
1.31k stars 54 forks source link

Timeout does not work in some cases (hangs indefinitely) #53

Closed JoshCheek closed 8 years ago

JoshCheek commented 9 years ago

In TextMate, when I run this code, it hangs indefinitely, even though there is a timeout set. Probably something in the parent process waits (probably either the consumer or EvaluateByMovingFiles). Or maybe the fork prevents the timeout from working correctly. Regardless, users shouldn't have to go grepping and killing processes on their own, SiB should respect the timeout.

read, write = IO.pipe

stream = IO.open write.to_i

if fork
  p $$
  stream.puts "parent"
  # stream.close unless stream.closed?  # => nil
  # write.close  unless write.closed?   # ~> Errno::EBADF: Bad file descriptor

else
  p $$
  stream.puts "child"
  # stream.close unless stream.closed?
  # write.close  unless write.closed?
end

stream
write
read.read

There is a surprisingly large number of processes spawned to accomplish SiB on my machine. I've analyzed them and know where they all come from, except for why there is three of the bottommost invocation:

# The original invocation from TextMate
# https://github.com/JoshCheek/text_mate_1-seeing-is_believing/blob/db1c2fd3e0de50370e3c0e2d2d834c4ed4b16edf/Commands/annotate%20all%20lines.tmCommand#L11-22
/bin/sh
  /Users/josh/code/dotfiles/bin/sib_ruby
    -S seeing_is_believing
      -Ku
      --result-length 400
      --number-of-captures 200 --alignment-strategy chunk --timeout 5 --shebang /Users/josh/code/dotfiles/bin/sib_rub

# looks like this is chruby-exec
# https://github.com/postmodern/chruby/blob/e7187af3347ff7da3be363f40f479fe7332b6942/bin/chruby-exec#L36-40
# also, apparently my chruby is outdated, b/c that code has been replaced on master
/bin/bash -l -c
  chruby 2.1.2 &&
    ruby -S
      seeing_is_believing
        -Ku
        --result-length 400
        --number-of-captures 200
        --alignment-strategy chunk
        --timeout 5
        --shebang /Users/josh/code/dotfiles/bin/sib_ruby

# This would be what we are telling chruby to exec, so RHS of the call above, after it's fixed the environment
# In newer versions, it calls exec, so wouldn't have two processes like this
ruby -S
  seeing_is_believing
    -Ku
    --result-length 400
    --number-of-captures 200
    --alignment-strategy chunk
    --timeout 5
    --shebang /Users/josh/code/dotfiles/bin/sib_ruby

# SiB executing the rewritten code. We see this one twice because it forked.
/Users/josh/.rubies/ruby-2.1.2/bin/ruby
  -W0
  -Ku
  -I /Users/josh/.gem/ruby/2.1.2/gems/seeing_is_believing-3.0.0.beta.5/lib
  -I /Users/josh/.gem/ruby/2.1.2/gems/seeing_is_believing-3.0.0.beta.5/lib
  -r seeing_is_believing/the_matrix
  /var/folders/7g/mbft22555w3_2nqs_h1kbglw0000gn/T/seeing_is_believing_temp_dir20150131-66512-yo5vuh/program.rb

/Users/josh/.rubies/ruby-2.1.2/bin/ruby
  -W0
  -Ku
  -I /Users/josh/.gem/ruby/2.1.2/gems/seeing_is_believing-3.0.0.beta.5/lib
  -I /Users/josh/.gem/ruby/2.1.2/gems/seeing_is_believing-3.0.0.beta.5/lib
  -r seeing_is_believing/the_matrix
  /var/folders/7g/mbft22555w3_2nqs_h1kbglw0000gn/T/seeing_is_believing_temp_dir20150131-66512-yo5vuh/program.rb
JoshCheek commented 8 years ago

Update: This program, when run will hang indefinitely. When interrupted from the shell (both fish and bash do this), it kills the process, its child, and its grandchild. When interrupted with SIGINT, from another process, it kills the parent and the child, but not the grandchild. The documentation assumes a deeper knowledge of Unix than I have. Trying to read explanations of processes / process groups / sessions / etc has not improved clarity. It may be possible to prevent it from hanging, though, which would be something, at least.

require 'timeout'
require 'rbconfig'

# helper methods
def annotate(description, *allowed_exceptions, &block)
  success = false
  puts "\e[33mPRE #{description}\e[39m"
  block.call
  success = true
rescue *allowed_exceptions
  success = true
ensure
  puts "\e[3#{success ? 2 : 1}mPOST #{description}\e[39m"
end

def close_all(*streams)
  streams.each { |io| io && !io.closed? && io.close }
end

# setup streams
parentout, childout = IO.pipe
parenterr, childerr = IO.pipe
childin,   parentin = IO.pipe

# clean up when exiting the program
at_exit do
  annotate("ensure streams closed") { close_all parentout, childout, parenterr, childerr, childin, parentin }
end

# the child program has its own child, both of which will hang waiting on the pipe to be closed
parentin.write <<-RUBY
  puts "hello, world"
  read, write = IO.pipe
  if fork
    write.puts "parent"
  else
    write.puts "child"
  end
  read.read
RUBY

# evaluate the code in a child process
child_pid = Kernel.spawn(RbConfig.ruby, '-', in: childin, out: childout, err: childerr)

# close streams talking to the child so that it could exit
[parentin, childout, childerr, childin].each(&:close)

# print anything we see the child print
consumer_thread = Thread.new do
  parentout.each_line { |line| p stdout: line }
  parenterr.each_line { |line| p stderr: line }
end

# wait for child to complete, timing out if it does not
begin
  Timeout.timeout 2 do
    annotate("wait for the child we started")        { Process.wait child_pid }
    annotate("wait for all children", Errno::ECHILD) { Process.wait }
    annotate("joining consumer thread")              { consumer_thread.join }
  end
rescue Timeout::Error
  annotate("killing child")           { Process.kill "KILL", child_pid }
  annotate("joining consumer thread") { consumer_thread.join }
end
JoshCheek commented 8 years ago

While debugging, I changed the way the child is invoked. IDK if this is relevant, but it took a bit to figure out, so here's the diff (print statements were because I was trying to understand why it didn't resolve this issue)

diff --git a/lib/seeing_is_believing/evaluate_by_moving_files.rb b/lib/seeing_is_believing/evaluate_by_moving_files.rb
index a48e69e..fa07d29 100644
--- a/lib/seeing_is_believing/evaluate_by_moving_files.rb
+++ b/lib/seeing_is_believing/evaluate_by_moving_files.rb
@@ -108,7 +108,7 @@ class SeeingIsBelieving
                out:                 child_stdout,
                err:                 child_stderr,
                child_eventstream => child_eventstream }
-      child = Process.detach Kernel.spawn(env, *popen_args, opts)
+      child = Kernel.spawn(env, *popen_args, opts)

       # close child streams b/c they won't emit EOF
       # until both child and parent references are closed
@@ -130,15 +130,26 @@ class SeeingIsBelieving

       # wait for completion
       Timeout.timeout timeout_seconds do
-        exitstatus = child.value.exitstatus
-        consumer.process_exitstatus exitstatus
+        puts "pre wait"
+        Process.wait child # for the child we started
+        status = $?.exitstatus
+        puts "status: #{status.inspect}"
+        begin  Process.wait       # for any children it may have started
+        rescue Errno::ECHILD
+        end
+        puts "post wait"
+        consumer.process_exitstatus status
         consumer_thread.join
       end
     rescue Timeout::Error
-      Process.kill "KILL", child.pid
+      puts "rescue timeout"
+      Process.kill "SIGKILL", child
       consumer.process_timeout timeout_seconds
+      puts "pre join"
       consumer_thread.join # finish consuming events
+      puts "post join"
     ensure
+      puts "ensure streams closed"
       [stdin, stdout, stderr, eventstream].each { |io| io.close unless io.closed? }
     end
JoshCheek commented 8 years ago

Ugh, I think the solution I have kills parents, too -.- Here's a diff of where I'm at:

diff --git a/lib/seeing_is_believing/evaluate_by_moving_files.rb b/lib/seeing_is_believing/evaluate_by_moving_files.rb
index 9f8a5eb..6d12abe 100644
--- a/lib/seeing_is_believing/evaluate_by_moving_files.rb
+++ b/lib/seeing_is_believing/evaluate_by_moving_files.rb
@@ -104,11 +104,14 @@ class SeeingIsBelieving
                                 )].pack('m0')

       # evaluate the code in a child process
-      opts = { in:                  child_stdin,
+      opts = {
+        child_eventstream => child_eventstream,
+        in:                  child_stdin,
         out:                 child_stdout,
         err:                 child_stderr,
-               child_eventstream => child_eventstream }
-      child = Process.detach Kernel.spawn(env, *popen_args, opts)
+        pgroup:              true, # run in its own process group so we can kill any children it spawned
+      }
+      child_pid  = Kernel.spawn(env, *popen_args, opts)

       # close child streams b/c they won't emit EOF
       # until both child and parent references are closed
@@ -130,17 +133,22 @@ class SeeingIsBelieving

       # wait for completion
       Timeout.timeout timeout_seconds do
-        exitstatus = child.value.exitstatus
-        consumer.process_exitstatus exitstatus
+        Process.wait child_pid
+        consumer.process_exitstatus $?.exitstatus
+        # begin  Process.wait # for any grandchildren
+        # rescue Errno::ECHILD
+        # end
         consumer_thread.join
       end
     rescue Timeout::Error
-      pgid = Process.getpgid(child.pid)
-      handler = trap("INT") { trap("INT", handler) } # noop
-      Process.kill "-INT", pgid
       consumer.process_timeout timeout_seconds
-      consumer_thread.join # finish consuming events
     ensure
+      begin
+        child_pgid = Process.getpgid(child_pid)
+        Process.kill "-INT", child_pgid
+      rescue Errno::ESRCH
+      end
+      consumer_thread.join
       [stdin, stdout, stderr, eventstream].each { |io| io.close unless io.closed? }
     end