dugsq / xenograte-xct

Xenograte Community Toolkit (XCT)
Other
19 stars 5 forks source link

starting a xenode will not really start under certain conditions although it shows like it did #12

Open ghost opened 11 years ago

ghost commented 11 years ago

while xenode is busy in process() (for example) stopping it and starting it again seems to work but it doesn't really start it (even though the messages say it does)

  1. add a delay of 7 seconds(or more if you want to give urself time to execute the commands at 4. & 5. below) ie. sleep(7) in process() of a xenode (the hello world one) and add 2 log messages before and after this, like so:
  # and the calling cycle is based on 'loop_delay' in configuration
  def process
    @log.debug("sleep started")
    sleep(7)
    @log.debug("sleep ended")
  end
  1. leave the loop-delay to 0.5 in run folder
  2. start the xenode
  3. when you see "sleep started" in the log, then stop the xenode and start it again
  4. notice that starting it says it started with some new pid (which also overwrites the .pid file with the new value ie. cat run/pids/xenode_1_pid) then notice the log eventually says "sleep ended" appears and then "shutdown!"
  5. now notice that the pid file is gone and the xenode is not running, so the 3. was kind of inconsistently telling us it started it... so to speak;

So the xenode at 3. is not really started as it's reported, because the already running xenode (which is stuck in process() still) is not shutdown yet and 3. doesn't wait for it to shutdown...

dougsq commented 11 years ago

AtKaaZ,

You are correct. I am reiterating below in some detail for those that might be new to eventmachine.

The sleep(7) will block the eventmachine loop so no other processing in the xenode or the eventmachine loop will happen until the sleep() has returned. If you you want to do a non-blocking sleep you can use EventMachine::Synchrony.sleep(seconds) to get an eventmachine timer based sleep function.

The xeno-cli shut down command uses a TERM signal and sets a shutdown flag. A timer in the eventloop looks for the shutdown flag and calls the xenode's shutdown() method, and then calls EM.stop to end the event loop, the xenode's program completes and the at_exit() is finally called. But none of that will happen until the sleep(7) returns.

The xeno CLI is a separate process and the start xenode command looks for the PID files to tell if a xenode is still running. The CLI should have raised an error saying the xenode was already running if the pid file was still there. It should only start another process if the original one had ended.

Looking at the xeno_cli.rb file, the puts message for starting the xenode is before the test to see if it is already running:

puts "Starting Xenode: #{xenode_id}"

# don't start it if it is already running
unless Xeno::xenode_running?(xenode_id)
  # run the xenode
  exec_cmd = "ruby -I #{lib_dir} -- #{lib_dir}/instance_xenode.rb "
  exec_cmd << "-f #{xenode_file} -k #{klass} "
  exec_cmd << "-i #{xenode_id.to_s} "
  exec_cmd << "-d " if @debug

  pid = fork do
    exec(exec_cmd)
  end

  Process.detach(pid)

end

which is a bug cause the puts statement should be inside the unless Xeno::xenode_running?(xenode_id) block and only print the "starting" message if the xenode was not already running.

ghost commented 11 years ago

also the pid file is changed with the new pid when the xenode is started, not sure when that happens with regard to that puts

Thanks for all the info. I forget to mention I'm using the develop branch

ghost commented 11 years ago

oh wait, actually I'm wrong about the pid changing, the pid remains from the old xenode running... got it now; so it shows the pid as if it detected that it was already running

ghost commented 11 years ago

I tried your version of sleep and I get this on the log:

E, [2013-07-15T09:53:10.376131 #23492] ERROR -- : #<FiberError: can't yield from root fiber> ["/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/em-synchrony-1.0.3/lib/em-synchrony.rb:88:in `yield'", "/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/em-synchrony-1.0.3/lib/em-synchrony.rb:88:in `sleep'", "/home/atkaaz/xenograte-xct/xenode_lib/hello_world_xenode/lib/hello_world_xenode.rb:45:in `process'", "/home/atkaaz/xenograte-xct/lib/instance_xenode.rb:171:in `block (2 levels) in spawn_xenode'", "/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/eventmachine-1.0.3/lib/em/timers.rb:56:in `call'", "/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/eventmachine-1.0.3/lib/em/timers.rb:56:in `fire'", "/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `call'", "/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine'", "/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run'", "/home/atkaaz/.rvm/gems/ruby-2.0.0-p247/gems/em-synchrony-1.0.3/lib/em-synchrony.rb:38:in `synchrony'", "/home/atkaaz/xenograte-xct/lib/instance_xenode.rb:114:in `spawn_xenode'", "/home/atkaaz/xenograte-xct/lib/instance_xenode.rb:63:in `initialize'", "/home/atkaaz/xenograte-xct/lib/instance_xenode.rb:418:in `new'", "/home/atkaaz/xenograte-xct/lib/instance_xenode.rb:418:in `<main>'"]

What I was trying to emulate why the sleep was some long operation that might be happening inside process() or even a normal operation as long as the stop and start of the xenode are happening at the same time as this operation is in progress (so while inside process())

dougsq commented 11 years ago

Ah yes.. If you wrap the call in a Fiber in the xenode that will go away. it happens cause there is only the root fiber from Em-Synchrony.

Try:

Fiber.new do
  EM.sleep(7)
end.resume

I think that's the right syntax... you might need the full EM::synchrony.sleep()

It 2:00am here so my brain is half dead ;)

dougsq commented 11 years ago

Here is a good read on fibers: http://www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/

ghost commented 11 years ago

that works, thanks

Fiber.new do
    EventMachine::Synchrony.sleep(7)
end.resume

but it's useless to me since it's non-blocking xD

EDIT: I'll read that link

dougsq commented 11 years ago

Didn't mean to close this issue.

dougsq commented 11 years ago

Here is a technique to have a xenode fire off work at certain intervals. It also has the side effect that you can force it to do the work anytime by sending it a message. Set the @loop_delay to 0.5 and the :fire_seconds to 7.

def startup(opts = {})
  @fire_time = opts[:fire_seconds].to_i
  @last_fired = Time.now.to_i
end

def process()
  if @last_fired - Time.now.to_i > @fire_time
    @last_fired = Time.now.to_i
    msg = XenoCore::Message.new
    process_message(msg)
  end
end

def process_message(msg)
  # do what you want here
end

and of course you can use an EM.timer in the startup() instead:

def startup(opts = {})
  EM::add_periodic_timer(7.0) do
    # do what you want here
  end
end

I like the process_message() version better cause it lets you force running the work or you can put it down-stream (make it a child of another xenode) and have the data-flow trigger the work and/or run on schedule.