brianlmoon / GearmanManager

PHP daemon for managing gearman workers
http://brian.moonspot.net/GearmanManager
BSD 2-Clause "Simplified" License
682 stars 190 forks source link

Odd solution to workers not wanting to stop with SIGTERM #150

Open ghost opened 6 years ago

ghost commented 6 years ago

I've been a longtime user of gearman-manager, thank you for your work on this!

Background: Just updated a worker server from centos 6.5 to 7.5. I noticed ^C (killing the manager while running in the foreground, and even background use) would cause the manager to hang until 60 seconds, when it will just forcefully kill the workers. Normally the workers will shutdown gracefully in a few seconds.

I was having a difficult time finding a solution, however, finally I just randomly added a line of code in the start_lib_worker method in GearmanPeclManager.php, and now GM will shutdown nicely.

        while(!$this->stop_work){

            if(@$thisWorker->work() ||
               $thisWorker->returnCode() == GEARMAN_IO_WAIT ||
               $thisWorker->returnCode() == GEARMAN_NO_JOBS) {

                if ($thisWorker->returnCode() == GEARMAN_SUCCESS) continue;

                if (!@$thisWorker->wait()){
                    if ($thisWorker->returnCode() == GEARMAN_NO_ACTIVE_FDS){
                        sleep(5);
                    }
                }

            }

            /**
             * Check the running time of the current child. If it has
             * been too long, stop working.
             */
            if($this->max_run_time > 0 && time() - $start > $this->max_run_time) {
                $this->log("Been running too long, exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

            if(!empty($this->config["max_runs_per_worker"]) && $this->job_execution_count >= $this->config["max_runs_per_worker"]) {
                $this->log("Ran $this->job_execution_count jobs which is over the maximum({$this->config['max_runs_per_worker']}), exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

        }

Adding any sort of line to evaluate does the trick:

        while(!$this->stop_work){

            if(@$thisWorker->work() ||
               $thisWorker->returnCode() == GEARMAN_IO_WAIT ||
               $thisWorker->returnCode() == GEARMAN_NO_JOBS) {

                if ($thisWorker->returnCode() == GEARMAN_SUCCESS) continue;

                if (!@$thisWorker->wait()){
                    if ($thisWorker->returnCode() == GEARMAN_NO_ACTIVE_FDS){
                        sleep(5);
                    }
                }

            }

            /**
             * Check the running time of the current child. If it has
             * been too long, stop working.
             */
            if($this->max_run_time > 0 && time() - $start > $this->max_run_time) {
                $this->log("Been running too long, exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

            if(!empty($this->config["max_runs_per_worker"]) && $this->job_execution_count >= $this->config["max_runs_per_worker"]) {
                $this->log("Ran $this->job_execution_count jobs which is over the maximum({$this->config['max_runs_per_worker']}), exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);
                $this->stop_work = true;
            }

        $true = false; # some extra code
        }

This is the most bizarre behavior I've witnessed, particularly since it doesn't output/echo any new characters or do anything at the IO level.

Do you have any possible explanation as to why doing this allows gearman manager to exit gracefully rather than having to SIGKILL the entire process? I tried running this on the v2 tag and v1 tag and the behavior is the same. I'm also running PHP 7.

Thanks!

brianlmoon commented 6 years ago

That is odd. I am running PHP 7 on Ununtu and have seen this issue. What settings do you have in your ini file?

brianlmoon commented 6 years ago

Also, what specific version of PHP 7?

ghost commented 6 years ago

That's comforting to know you've seen it before. Did you know that adding a line in the worker script "fixed" it? I'd give anything to know what was going on there behind the curtains.

Config:

[GearmanManager]
worker_dir=/var/www/vhosts/app.itemlogic.com/app/gearman
auto_update=1
include=*
count=2
max_worker_lifetime=3600
host=app-jobmaster
-vvvv -c /etc/gearman-manager/config.ini -l /var/log/gearman-manager.log

PHP:

[centos@stg-app-jobslave ~]$ php -v
PHP 7.0.30 (cli) (built: Apr 28 2018 08:14:08) ( NTS )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2017 Zend Technologies
brianlmoon commented 6 years ago

Sorry, I meant I have not seen it. Sorry for the typo.

kevcam4891 commented 6 years ago

@brianlmoon Do you use virtualbox? I can create an image and let you have it in case you want to have the specific environment.

brianlmoon commented 6 years ago

No. But I do use Docker.

frasche commented 6 years ago

I noticed the same behavior in my VM Ubuntu 18.04 and PHP 7.2.7, pecl version only: It calls stop_children() with SIGTERM, but this does not kill the process. Finally after 60 secs it calls stop_children with SIGKILL again and the process get killed.

After some research I replaced the

declare(ticks = 1);

in the manager script and in GearmanManager.php with

pcntl_async_signals(true);

http://php.net/manual/en/function.pcntl-async-signals.php

This of course only works with >= PHP 7.1 but it did the trick for me. All workers are shutting down now nicely within a few seconds.

Edit Thinking a bit of this, I got to the following conclusion Based on this bug report: https://bugs.php.net/bug.php?id=71448 and the comment of Nikita Popov:

Due to an implementation bug, the declare(ticks=1) directive leaked into different compilation units prior to PHP 7.0. This is not how declare() directives, which are per-file or per-scope, are supposed to work.

Using a single declare(ticks=1) in some class inside a multi-class autoloaded project doesn't make a lot of sense. Even on PHP 5.x declare(ticks=1) would only affect code that is compiled starting from that declaration, which means that depending on the loading order of classes your whole codebase might end up using ticks or maybe only a few classes that were loaded late. (Not to mention that using ticks will make your code many times slower, but that's a different issue.)

For PHP 7.1 we may make automatic pcntl signal handling work without ticks, as we now have a new internal mechanism for handling ordinary timeouts which we can reuse.

Due to this statement it seems to me, declare(ticks = 1) should not be used anymore for this. At least in a case where you use autoloading your classes in a larger project, it is almost unpredictible.

There are two possibilities: For PHP >= 7.1: use of pcntl_async_signals(true); is the way to go replacement for declare(ticks) in posix/pcntl functions. Just replace the declare statements with pcntl_async_signals(true);

For PHP > 5.3 < 7.1: we may use pcntl_signal_dispatch(), but this needs to be called after every posix_kill() and after some other pcntl functions. After playing around a while with this, I had a smooth running version without ticks and pcntl_async_signals(true) and use only pcntl_signal_dispatch().

brianlmoon commented 5 years ago

Anyone looked into this further? I have declare ticks in my bin script here and apparently always have. I always understood it that you had to declare that in the initial script, at the very top. But if there is a better way going forward, I am all for it.

ReduktorSpalin commented 5 years ago

Unfortunately declaring ticks in class-scope might have unexpected outcome. I didn't see it coming as well, so please check two interesting cases. Those just blown my mind: https://github.com/ReduktorSpalin/php_ticks_demo .

So it figures placing declare in different parts might have a different outcome. I hope it will be helpful finding a solution to this problem. I am not sure what is right in context of this project and Gearman's case. Maybe current scenario is correct in general, but some part should be ported to other class.

I inherited an another class, but feel free to increase the complexity of references to other classes in order to get even better & bigger image of solutions that might take advantage of this change (e.g. until now I found it difficult to autoload different class into worker).

Tested on: