10up / WP-Minions

ARCHIVED: Job Queue for WordPress
GNU General Public License v2.0
231 stars 40 forks source link

Possible issue with caching $blog_id on `\WpGears\Gearman\Client` #18

Closed nerrad closed 8 years ago

nerrad commented 8 years ago

This issue is still an assumption, it hasn't been proven yet. I'll be adding to it as I investigate but wanted to get it written up in case there's others that can confirm the behaviour we experienced.

I've implemented WPGears on a WP multisite install to handle the sending of notifications for users of our platform. This past week we tried upping the number of blogs processed per worker and started seeing some cross pollution of blog_id where notifications were processed in the wrong blog context (this doesn't happen when jobs per worker remains at 1)

One thing that is unique about our setup that likely contributed to this issue being exposed is if while a job is being executed, we experience an error that prevents the notification being processed/sent at that moment in time then we add the job back to the queue before exiting. I think this is what triggers the cross blog pollution because when a job is executing via WPGears\Gearman\Worker the blog has been switched to the blog_id for the job, but then if while IN that switch, \WpGears\Gearman\Client::add is called, that will end up using the blog_id cached in the worker from when the client was first instantiated.

So one way of possibly handling this is to not cache the blog_id and always return whatever the value of get_current_blog_id() is. Another possibility would be to only cache the blog_id when the JOBS_PER_WORKER config is at 1 (which kind of defeats the purpose of having it cached anyways).

Another potential danger of caching the blog_id that may surface in another context is if \WPGears\Gearman\Client is instantiated at the beginning of a request, and then add_job is done in the context of a switched blog by other code, the job will end up getting added for the wrong blog_id (the blog from when it was instantiated). So it seems that the best option here would be to not cache.

nerrad commented 8 years ago

Just going to summarize the ways this could be resolved in its own comment (that can be edited as necessary):

  1. Don't cache blog_id and always just use get_current_blog_id()
  2. Only cache blog_id when JOBS_PER_WORKER == 1. (However, this will still be flawed when $client->get_blog_id() is called in a single normal (non-worker) request on a switch_blog context where its expected the blog_id for the current context is used when adding a job).
  3. Allow client code calling \WpGears\Gearman\Client::add to override the blog_id in the arguments.
dsawardekar commented 8 years ago

@nerrad Can you clarify what you mean by "cache blog id"?

The blog ID is effectively part of the Gearman job payload. Every time a Gearman job is run, we look at the blog id in the payload. Then switch to that site, run the action, and switch back.

GearmanWorker:117

This ensures that the action is executed in the context of the blog that issued the wp_async_task_add.

If the blog that you want to execute the action on is different from the one you are performing the wp_async_task_add from, then you need to pass that blog_id as an argument. And switch to blog within your async callback.

wp_async_task_add( 'foo_action', array( 'blog_id' => <required_blog_id> ) );

// corresponding action
function foo_action( $args ) {
    $blog_id = $args['blog_id'];
    switch_to_blog( $blog_id );

    // main action code here

   // restore here
}
nerrad commented 8 years ago

blog_id is already automatically added to the job arguments when creating a job in \WpGears\Gearman\Client (see https://github.com/10up/WP-Gears/blob/master/includes/WpGears/Gearman/Client.php#L74). When I talk about blog_id being cached I mean that it is stored as a property of WpGears\Gearman\Client for the duration of a request (see https://github.com/10up/WP-Gears/blob/master/includes/WpGears/Gearman/Client.php#L182). This is a huge gotcha when more than one job is added in a request across multiple blog switches OR when more than one job is running per worker (and a new job is added for a different site because only the original blog set the first time the client is instantiated is used).

I understand that your suggestion to just send along blog_id in the arguments for when I add a job. That just means I can't rely on the automatic blog_id added to the job data ever and basically there will always be two switch_to_blog and restore_current_blog calls for every job execution. While its a solution, I think it'd be better to fix the assumption that adding a job would never be called in a different site context from the first time WpGears\Gearman\Client was instantiated.

dsawardekar commented 8 years ago

@nerrad Thanks, yeah I see what you mean now. The fix also makes sense. We can just delete the helper get_blog_id and use WP's get_current_blog_id directly.

Do you want to make a PR? You may also need to update some tests.

Edit: On second thought, We'll still need the function for multisite detection to return false. So just remove the caching in get_blog_id().

nerrad commented 8 years ago

yah I'll do a pr (and update any necessary tests).

And I got your "on second thought" bit as well :)

dsawardekar commented 8 years ago

@cmmarslender This issue is resolved via the #19. Can we cut a new release?

nerrad commented 8 years ago

before cutting a new release I'd like to see #17 added if possible. Assuming doing it is acceptable I could do a quick pull request for it.

dsawardekar commented 8 years ago

@nerrad We aren't using the extra parameter currently, but if you need it go for it. Just make sure to keep the tests suite passing. :)

dsawardekar commented 8 years ago

I've merged #20. @cmmarslender Can you take a look?