10up / ElasticPress

A fast and flexible search and query engine for WordPress.
https://elasticpress.io
GNU General Public License v2.0
1.25k stars 312 forks source link

ElasticPress + Jetpack Data Sync == kablooey #572

Closed Ipstenu closed 7 years ago

Ipstenu commented 8 years ago

After the Jetpack 4.3 update, my site using ElasticPress crashed hard.

I worked with the Jetpack team to diagnose the situation. What we determined was ... well it's long but it appears ElasticPress has an infinite loop when there are autoembeds to update.

tl;dr summary:

If I disable ElasticPress (not just uncheck the box, hard disable) the issue goes away and the sync runs and everyone lives.

Really annoying, verbose, long details follow:

Jetpack runs a 'full' data sync to it's servers when you connect/disconnect Jetpack or when you turn on/off a module. At 10:20am on Tuesday Sept 6th, I enabled Sitemaps on a site with a LOT of Post Meta (gobs).

About an hour later, I started getting a ton of alerts from my server, telling me I had some wp-cron.php calls that looked like this:

Executable:

/usr/bin/php-cgi

Command Line (often faked in exploits):

/usr/bin/php-cgi /home/lezwatchtv/public_html/wp-cron.php

Network connections by the process (if any):

tcp: MYIP:39734 -> JETPACKIP:443

Sketchy, right? I rolled back all my 'recent' changes (WP back to 4.6, Jetpack back to 4.2.x) but it kept happening. I reached out to Jetpack and we determined my sync got 'stuck' on about 190 chunks out of 400 odd (Helen, that's the autocorrect I was telling you about). When it got stuck, cron wouldn't end and worse it would make more and more and more jobs until my server freaked out and pkilled all the PHP for that account. It took about 3 hours to get there.

Escalation began. Part of this was due to Jetpack not nicely ending cron things. That was fixed in 4.3.1 but we could still make it crash with the full sync on this site. Dan W. came up with the ElasticPress idea, since he saw it too was triggering actions whenever post meta was updated. He knew Jetpack was updating _oembed_time_{long_base64_string}

As he said:

This caused wp-cron to never end and sad pandas to happen.

Whew.

So right now, my 'fix' is disabling ElasticPress. Perhaps ironically I only have two posts that use embeds on that site, which explains why we got to around 190 or so chunks before the sync locked up. If I had none, it wouldn't have happened at all.

This sync code was introduced in Jetpack 4.2, however until Tuesday, I'd not messed with any modules, so I'd never triggered a full sync before.

I'm happy to debug as much as needed, or provide you with weird ass data to replicate.

tlovett1 commented 8 years ago

Thanks for the report @Ipstenu!

@Ritesh-patel can you test on 2.1?

gravityrail commented 8 years ago

Thanks for the writeup @Ipstenu!

For the record, the way I was able to repro this bug consistently was by deleting the oembed timeout meta like this in wp shell:

> global $wpdb;
> $wpdb->query( "delete from $wpdb->postmeta where meta_key like '\_oembed%'" );

That way the next time we rendered the_content with filters, it had the loop. I'm not 100% certain it's an infinite loop, but it was sufficiently long that my PHP VM crashed saying the stack was too deep.

Ipstenu commented 8 years ago

I don't believe in perpetual motion, so 'infinite' is unlikely, and it's more 'way too long than should be acceptable on a VPS.'

Glad you can re-create it 😄

Ritesh-patel commented 8 years ago

Hi @Ipstenu @gravityrail

I have checked this with 2.1 but not able to reproduce. My setup was continuosly failing to connect to JetPack so I tried what @gravityrail suggested. Only thing I noticed is that, is does call WP_Embed->autoembed 2 times which in turn updates meta 2 times but after that it doesn't call action_queue_meta_sync in EP_Sync_Manager.

Can you test it with 2.1 (current develop branch) and let us know if you can reprodce it?

Regards, Ritesh

Ipstenu commented 8 years ago

A8C has run off to their Grand Meetup. I can see if I can reproduce it on my own (should be able to...).

Ipstenu commented 8 years ago

I've installed ElasticPress 2.1 but since I can't kill the process without a Jetpacker handy, I'm waiting for them to come home from Whistler :) Just in time for YOUR company meet up... Mine had ours last week (only a day).

tlovett1 commented 7 years ago

@Ipstenu any update?

Ipstenu commented 7 years ago

I've had it active for a week and no issues. Looks okay. I triggered a change in Jetpack that SHOULD have triggered a full sync and nothing crashed.