aws-samples / aws-refarch-wordpress

This reference architecture provides best practices and a set of YAML CloudFormation templates for deploying WordPress on AWS.
MIT No Attribution
1.08k stars 601 forks source link

Slow Plugin Install time (504 Gateway Timeout errors) #68

Open dave-gil opened 5 years ago

dave-gil commented 5 years ago

After successful clean install of this Wordpress reference architecture template, I experienced huge time lags in the admin interface.

Installing plugins regularly caused 504 timeout errors and even bypassing the cloudfront install didnt help. The only solution to get smooth installs that I have found so far is to edit the attributes of the Load Balancer and increase the Idle Timeout to 180 seconds or more. This permits most plugins to install successfully but they take forever!

Any similar experiences, thoughts?

michael-newman commented 5 years ago

In the past, I also ran into the 504 errors when installing or upgrading plugins; however, simply changing the Route53 domain A record from CloudFront to the ALB while installing/upgrading plugins resolved the issue and we have been doing this ever since. Installing/upgrading does take a little longer than a single instance setup; but, I would not characterize mine as taking forever. Note, I have my ALB Idle Timeout set to 120 seconds. Otherwise, any latency we experienced in the past was largely due to number of plugins and/or conflicts.

Sidenote re plugins... If you run into frontend page latency issues, you might want to consider a plugin management plugin--I've tried a few; but, ultimately, found Freesoul Deactive Plugins (https://wordpress.org/plugins/freesoul-deactivate-plugins/) to be the best; super nice interface, intuitive, and it simply works. Additionally, the plugin author, Jose, is a great to work with and he takes pride in what he does, and it shows.

dave-gil commented 5 years ago

Thanks for the response... Glad to hear I’m not alone in experiencing such problems!

Do you have thoughts on what might be causing such slow response times in the admin interface and particularly in relation to installing plugins? I can’t think of any reason why the php install scripts should be that much slower because they are being accessed from an EFS than running them directly on an EC2 instance? (Especially given they are cached to bytecode)

Waiting 120 seconds for a plug-in to install feels like I’ve got something wrong... or the design is wrong? On single server fresh installs navigating around the admin interface is quick and slick and the same Plugins install in seconds.

Does the EFS make this setup seem horribly slow and cumbersome. I appreciate the WordPress admin interface slow down when bloated with numerous plugins, but this is a clean install with no plugins installed.

While I see the benefits of scaling and redundancy in this design, I fear I’ve either got something wrong or the design is flawed, because the admin experience seems so bad.

I expected this setup to be lightning quick with almost instantaneous page loads (even in the admin). I would be interested to learn about other people’s experiences and I am open to any suggestions to correct or tweak my setup.

On 14 Jun 2019, at 23:24, Mike Newman notifications@github.com<mailto:notifications@github.com> wrote:

In the past, I also ran into the 504 errors when installing or upgrading plugins; however, simply changing the Route53 domain A record from CloudFront to the ALB while installing/upgrading plugins resolved the issue and we have been doing this ever since. Installing/upgrading does take a little longer than a single instance setup; but, I would not characterize mine as taking forever. Note, I have my ALB Idle Timeout set to 120 seconds. Otherwise, any latency we experienced in the past was largely due to number of plugins and/or conflicts.

Sidenote re plugins... If you run into frontend page latency issues, you might want to consider a plugin management plugin--I've tried a few; but, ultimately, found Freesoul Deactive Plugins (https://wordpress.org/plugins/freesoul-deactivate-plugins/) to be the best; super nice interface, intuitive, and it simply works. Additionally, the plugin author, Jose, is a great to work with and he takes pride in what he does, and it shows.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/aws-refarch-wordpress/issues/68?email_source=notifications&email_token=ABFWP5PJ7ZDM6O44CXRZAPDP2QLAHA5CNFSM4HYNMWJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXYHQMY#issuecomment-502298675, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABFWP5ILT6VWVSL46M7GEXTP2QLAHANCNFSM4HYNMWJQ.

michael-newman commented 5 years ago

For a point of clarity, I was just sharing my Idle timeout setting as a point of comparison for you and did not mean to infer installing plugins takes 120 seconds--Sometimes a large plugin or update might take a minute; but, many times updates are a few seconds to 10 seconds or so, depending on size.

If you are running into admin latency issues when installing plugins on a fresh install, then it sounds like there is a configuration issue. Your highlighting EFS, and the only item I would suggest you double check there is the Burst Credit Balance and ensure you have enough dummy data saved in EFS so you have enough data throughput--I recall this was an issue for me when I first stood up this infrastructure (fresh install with only a couple of plugins). As far as admin interface speed of this reference architecture vs that of a single EC2 instance, my single EC2 instance is 2-3 seconds faster than the RefArch (same full blown site w/many plugins) --I believe this is just a result of all the extra jumps the RefArch has with all the subnets. Perhaps, I've gotten used to it; but, I don't find issue with it.

On the front end, my page loads are pretty quick, noting I leverage Freesoul deactivate plugins, W3TC, memcached, and CloudFront w/a secondary S3 origin for static content. My GTmetrix opportunities are focused on deferring parsing of javascript and optimizing images, otherwise very good.

I would also suggest reaching out to the AWS tech support as they have always done good by me. Good luck and stick with it--hopefully others chime in to provide you additional perspectives.

dave-gil commented 5 years ago

Thanks Mike, I added 500GB of dummy data to EFS on install. For a quick comparison if you delete and reinstall W3 Total Cache plugin directly within Wordpress (so I from wordpress.org) what install times do you get? I get the following times:

AWS Refarch Wordpress clean install: 152 seconds (avg) High Availability Bitnami Wordpress (based on AWS Refarch): 145 seconds (avg) Bitnami Wordpress single EC2 behind load balancer: 10-12 seconds (avg)

Something must be wrong in the design with respect to in accessing servers upstream perhaps?

What’s the best way to reach out to AWS tech team (Do they monitor this github?)

Thanks again for your kind input.

Anyone else with experiences?

From: Mike Newman [mailto:notifications@github.com] Sent: 15 June 2019 02:31 To: aws-samples/aws-refarch-wordpress aws-refarch-wordpress@noreply.github.com Cc: Dave Gilfillan davegilfillan@hotmail.com; Author author@noreply.github.com Subject: Re: [aws-samples/aws-refarch-wordpress] Slow Plugin Install time (504 Gateway Timeout errors) (#68)

For a point of clarity, I was just sharing my Idle timeout setting as a point of comparison for you and did not mean to infer installing plugins takes 120 seconds--Sometimes a large plugin or update might take a minute; but, many times updates are a few seconds to 10 seconds or so, depending on size.

If you are running into admin latency issues when installing plugins on a fresh install, then it sounds like there is a configuration issue. Your highlighting EFS, and the only item I would suggest you double check there is the Burst Credit Balance and ensure you have enough dummy data saved in EFS so you have enough data throughput--I recall this was an issue for me when I first stood up this infrastructure (fresh install with only a couple of plugins). As far as admin interface speed of this reference architecture vs that of a single EC2 instance, my single EC2 instance is 2-3 seconds faster than the RefArch (same full blown site w/many plugins) --I believe this is just a result of all the extra jumps the RefArch has with all the subnets. Perhaps, I've gotten used to it; but, I don't find issue with it.

On the front end, my page loads are pretty quick, noting I leverage Freesoul deactivate plugins, W3TC, memcached, and CloudFront w/a secondary S3 origin for static content. My GTmetrix opportunities are focused on deferring parsing of javascript and optimizing images, otherwise very good.

I would also suggest reaching out to the AWS tech support as they have always done good by me. Good luck and stick with it--hopefully others chime in to provide you additional perspectives.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/aws-refarch-wordpress/issues/68?email_source=notifications&email_token=ABFWP5MHTADWZAORJ4LUEEDP2RA37A5CNFSM4HYNMWJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXYNVKY#issuecomment-502323883, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABFWP5PFHZH254EB2WLSY5TP2RA37ANCNFSM4HYNMWJQ.

michael-newman commented 5 years ago

Hey Dave, from your exampled installed times, it sure does look like there is an issue here... guess I've gotten used to the plugin load latency. Unfortunately, I can't use my environment for testing install times; we have it locked down.

As far as reaching out to AWS Tech, we utilize the AWS Business Support Plan... best $100/month we could ever spend--we have attained great support on a variety of topics. I believe you could subscribe and unsubscribe as needed. As far as AWS monitoring Github, I do not believe they are that active; but, then again, I'm only on from time to time.

Definitely a topic worth pursuing and believe we all would benefit from any changes...

Others, please chime in...

tdondich commented 5 years ago

For those with these problems, in order to help better troubleshoot, share the following:

Some of the things to note, EFS is not very performant with writes of small files. Plugin installs fall under this category. Unless a good amount of IOPS are allocated, then your EFS performance is going to be sad in this category. EFS is built for redundancy, not performance. However, once the files are there, performance can be snappy due to PHP file caching/etc.

Web Tier. The instance type matters here. If you are utilizing T* instance types, you may be running into CPU credit exhaustion which results in heavy throttling. So this is something to be very aware of when installing plugins.

Share your config info and let's see if we can find similarities.

dave-gil commented 5 years ago

@tdondich your comment makes sense to me.

I have tried EFS with and without dummy data (500GB) (for file system tier) I have tried only instance types t2 small and medium (for web tier)

tdondich commented 5 years ago

@fippy yep, I would move away from the t2* instance types. Certain plugins may constantly be doing filesystem touches, which is not great. Anything that reads from media will also be incredibly slow. So I'm also looking into how to simply store the entire media library in s3 to avoid that read/write.

badcrocodile commented 4 years ago

@fippy did you have any luck getting this issue resolved? I've just run my first AWS WP setup following the RefArch almost exactly and have the same problems and it's just not sitting right with me. Core updates fail completely. The only modifications from RefArch being to run on t2 mediums. Have you found a better configuration that solves these plugin installs / WP core update issues?

michael-newman commented 4 years ago

@tdondich, re Media Library and S3... how are you setting this up different than what is contained in the AWS White-paper (https://d1.awsstatic.com/whitepapers/wordpress-best-practices-on-aws.pdf)? Would welcome hearing your insights...

tdondich commented 4 years ago

The extremely harsh reality of this deployment is that utilizing EFS is simply a no-go. You can't get enough performance out of it to make wp-admin reasonably usable when you are using a handful of plugins, especially any plugins that require doing file scans. EFS was built for durability and large scaling requirements, not necessarily for performance when dealing with small files.

So the best thing to do is to have a localized NFS cluster or using high availability appliances such as SoftNAS.

I'm considering building a pull request that would offer the choice of EFS or deploying a localized NFS high availability cluster. Obviously EFS will most likely be more cost effective but will certainly hurt performance. NFS will cost more but you get dramatically faster performance when not caching on the front end or when using wp-admin.

dave-gil commented 4 years ago

Sorry, for slow response.

Unfortunately no, I’m yet to find a real world HA Wordpress solution that runs anywhere near reasonable speeds in terms of basic administration at least. Every solution on Amazon EFS seems to be a non-starter.

From: Jason [mailto:notifications@github.com] Sent: 28 July 2019 17:10 To: aws-samples/aws-refarch-wordpress aws-refarch-wordpress@noreply.github.com Cc: Dave Gilfillan davegilfillan@hotmail.com; Mention mention@noreply.github.com Subject: Re: [aws-samples/aws-refarch-wordpress] Slow Plugin Install time (504 Gateway Timeout errors) (#68)

@fippyhttps://github.com/fippy did you have any luck getting this issue resolved? I've just run my first AWS WP setup following the RefArch almost exactly, the only modifications being to run on t2's. Have you found a better configuration that solves these plugin installs / WP core update issues?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/aws-refarch-wordpress/issues/68?email_source=notifications&email_token=ABFWP5JLYUTPVC52UCJO6C3QBXAGJA5CNFSM4HYNMWJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD27BTMA#issuecomment-515774896, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABFWP5JZ47XFETWUJS6T7L3QBXAGJANCNFSM4HYNMWJQ.

dave-gil commented 4 years ago

Certainly has to be worth a try. I’d be happy/interested to test out that approach if/when cloudformation template modified.

From: tdondich [mailto:notifications@github.com] Sent: 29 July 2019 17:32 To: aws-samples/aws-refarch-wordpress aws-refarch-wordpress@noreply.github.com Cc: Dave Gilfillan davegilfillan@hotmail.com; Mention mention@noreply.github.com Subject: Re: [aws-samples/aws-refarch-wordpress] Slow Plugin Install time (504 Gateway Timeout errors) (#68)

The extremely harsh reality of this deployment is that utilizing EFS is simply a no-go. You can't get enough performance out of it to make wp-admin reasonably usable when you are using a handful of plugins, especially any plugins that require doing file scans. EFS was built for durability and large scaling requirements, not necessarily for performance when dealing with small files.

So the best thing to do is to have a localized NFS cluster or using high availability appliances such as SoftNAS.

I'm considering building a pull request that would offer the choice of EFS or deploying a localized NFS high availability cluster. Obviously EFS will most likely be more cost effective but will certainly hurt performance. NFS will cost more but you get dramatically faster performance when not caching on the front end or when using wp-admin.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/aws-refarch-wordpress/issues/68?email_source=notifications&email_token=ABFWP5M7HVYFMWW5VY4CRGDQB4LRPA5CNFSM4HYNMWJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BILEY#issuecomment-516064659, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABFWP5JBNLJ2H2DTWMTX6RTQB4LRPANCNFSM4HYNMWJQ.

saluminati commented 4 years ago

@fippy Are you a Messiah ? In my case you are.... I have wasted so much time fixing this issue until I found your solution suggesting idle timeout on ALB. Thanks for the help mate.

HarishKM7 commented 4 years ago

In my case, increasing the ALB idle timeout from 60 to 300 did not help, but changing the "Origin Response Timeout" & "Origin Keep-alive Timeout" to 60 in CloudFront fixed the issue.

angelovescio commented 3 years ago

A relatively quick, but more costly solution, is to switch from "Burst Throughput" to "Provisioned Throughput" in AWS for the EFS volume. For my install off of this template, and after running for about a year, the size of the volume was ~400MB and 10M/s throughput was an additional $60/month. You can switch to this temporarily if it effects production, until you are able to get the cache stuff running. Then buy some burst credits, just in case, and switch it back after you make the cache changes.

wesleywh commented 2 years ago

I know this is really old but is a top result for this particular search. Have not tried the following yet but will soon...

The solution I'm building makes use of unison if anyone is interested. You can also just mount EFS async on an EC2 instance. If you're using something like fargate then you will have to use my unison method.

Found it via this blog: https://www.baeldung.com/linux/synchronize-linux-directories

Download unison: https://github.com/bcpierce00/unison

mkdir -p /unison
cd /unison
curl -LJO https://github.com/bcpierce00/unison/releases/download/v2.52.0/unison-v2.52.0+ocaml-4.01.0+x86_64.linux.tar.gz
tar -xvf unison-v2.52.0+ocaml-4.01.0+x86_64.linux.tar.gz

Then you have have it watch for directory changes and sync between two directories like the following:

./unison -batch=true -repeat watch /efs/ /usr/src/wordpress/

Then just make sure your efs volume is mounted to /efs. This mimics what mounting it asynchronously would do. That way file reads and writes return immediately but are still synced to EFS which in turn syncs it to your other instances/containers of wordpress.

Of course you need to wrap that unison repeat command in something like duminit to make sure the process is already running. Not sure the best approach there yet but I'm sure I'll find something. If I remember to post back here I will to let you know how it goes.

wesleywh commented 2 years ago

Okay I tried it out and it works really really well! Now I set this up with docker and fargate but this whole process can be setup with EC2 easy.

In my Dockerfile:

WORKDIR /unison
ADD https://github.com/bcpierce00/unison/releases/download/${UNISON_VERSION}/unison-${UNISON_VERSION}+ocaml-4.01.0+x86_64.linux.tar.gz /unison
RUN tar -xvf unison-${UNISON_VERSION}+ocaml-4.01.0+x86_64.linux.tar.gz && \
    rm /unison/unison-${UNISON_VERSION}+ocaml-4.01.0+x86_64.linux.tar.gz && \
    mv /unison/bin/* /bin/

Then in my startup script I run:

$(while "true"; do unison  -batch=true -repeat=watch -silent -group -owner -auto -times ${SYNC_SOURCE_DIR} ${SYNC_TARGET_DIR} >/dev/null 2>&1; done) &

This will make sure ${SYNC_SOURCE_DIR} and ${SYNC_TARGET_DIR} are always in sync. As a bonus if uninon dies for whatever reason it will restart. Could also put a sleep in the while incase of a crash loop, but this is good enough for now.

Then mount your EFS volume to your ${SYNC_SOURCE_DIR} and the path to the root of your wordpress installation to ${SYNC_TARGET_DIR}. Doing that it will write to the local filesystem and sync to EFS. Also if something get's written to EFS it will sync to the wordpress directory.

Setting it up like this I tried it and I no longer got timeouts installing or removing plugins because it is installing directly to the filesystem THEN copying to EFS. The same goes for reads as it will read directly from the filesystem and NOT from EFS so it will be really fast.

Hopefully this helps someone! It definitely solved my problem. Im hosting on fargate/load balancer/cloudfront combo.