10up / WP-Minions

ARCHIVED: Job Queue for WordPress
GNU General Public License v2.0
230 stars 40 forks source link

race condition when running multiple processes (?) #5

Closed soderlind closed 8 years ago

soderlind commented 8 years ago

I written a plugin to test WP Gears with the heartbeat API. The idea is to use heartbeat to update status when a gearman job is done. The plugin is at https://gist.github.com/soderlind/a27527c3ed6545f594d7 EDIT, this is the correct url: https://gist.github.com/soderlind/94629e4ae933407d823c

If I run the default 5 processes using supervisor the result is intermittent (i.e. the callback in my plugin i run now and then). I have the following in my /etc/supervisor/supervisord.conf:

[program:wp_gears_worker]
command=/usr/bin/php /srv/www/wp-gears-test/htdocs/wp-gears-runner.php
process_name=%(program_name)s-%(process_num)02d
numprocs=5
directory=/tmp
autostart=true
autorestart=true
killasgroup=true
user=www-data

if I change to numprocs=1 my plugin works as expected

soderlind commented 8 years ago

The problem must be in wordpress, wp gears or my plugin. Logging output from /srv/www/wp-gears-test/htdocs/wp-gears-runner.php show that it's triggered when it should, also with numprocs=5

cmmarslender commented 8 years ago

I'd agree that is sounds like a race condition. It looks like you're using a single option no matter what job is running wp-gears-heartbeat-time

Using a unique identifier for each job is likely to be more reliable for tracking the status for any single job, so that you don't have multiple workers potentially trying to update the same option at the same time.

soderlind commented 8 years ago

Thank you for pointing me in the right direction, I think I've solved it

The worker callback:

static function wp_gears_heartbeat_worker_callback( $args ) {
    //if ( 'gettime' == $args['action']) { // doesn't work, $args is a string with value = 'gettime'
        sleep(1); // without, you'll  get (in supervisor): "INFO gave up: nn_worker-00 entered FATAL state, too many start retries too quickly"
        update_option( 'wp-gears-heartbeat-time_' . $_GET['doing_wp_cron'],   time() ); // $_GET['doing_wp_cron'] is unique
    //}
}

When receiving the heartbeat

// get all options since last heartbeat
$all_options = wp_load_alloptions(); // does this cause a heavy load?
$my_options = array();
foreach( $all_options as $name => $value ) {
    if(stristr($name, 'wp-gears-heartbeat-time_')) {
        $my_options[] = $value;
        delete_option( $name );
    }
}
if ( count($my_options) ) {
    $response['wp-gears-heartbeat-time'] = json_encode($my_options);
}
soderlind commented 8 years ago

Please reopen the ticket, I still have a race condition and it happens between the runner and the worker callback. Will try to debug tomorrow, have to get some sleep now

dsawardekar commented 8 years ago

@soderlind You need to use startsecs=0 with your supervisord configuration. Supervisor has a feature that assumes a process that exited within 1 sec (default for startsecs) has failed and should be retried. Since you aren't doing any heavy processing your job is being executed by the worker inside 1 second.

I would also recommend giving a psuedo id to the option. That way when you are pulling it's status you can just check against that id, instead of loading all options.

$id = uniqid();
wp_async_task_add( 'wp_gears_heartbeat_worker', array( 'action' => 'gettime', 'id' => $id ), 'high' );

// in the callback use this id as part of the option name
// and return the above $id variable to the frontend via wp_localize_script

Additionally the code that triggers the async task to execute is hooked to the plugins_loaded action. This is an async infinite loop. The plugins_loaded event will always fire even when running inside Gearman. You need to queue the async task via some user specific action like Button click. Else move it to inside an if-else block on the constant DOING_ASYNC. This constant will be true only when execution is inside a Gearman Worker.

soderlind commented 8 years ago

I agree that I should use a pseudo id and not rely on an id that I don't control, but somehow passing an array to the worker callback doesn't work, $args in the callback becomes a string with the content of the first value i.el "gettime": https://gist.github.com/soderlind/94629e4ae933407d823c#file-wp-gears-heartbeat-php-L27

Ref using plugins_loaded, this is just a test to see if I can use Hearbeat together with WP Gears. I just used plugins_loaded to create a simple plugin without an UI. Where I plan to use it it will be an async task that will run more than 1 sec (most likely several minutes).

Having slept on it, I see now that the plugin is working as expected, I was to tired to grok the output of my own testes.

Anyway, than you for your feedback. I'll close this issue and create a new one on the worker callback $args,