bitnami / vms

Bitnami VMs
https://bitnami.com
Other
203 stars 42 forks source link

[WordPress] Server becomes unresponsive and connect ssh to, only fix is hard shut down and then start up #1388

Closed varnals761 closed 5 months ago

varnals761 commented 6 months ago

Platform

AWS

bndiagnostic ID know more about bndiagnostic ID

ac895beb-23a2-c1fb-014d-4bba060f1f2f

bndiagnostic output

===== Begin of bndiagnostic tool output =====

✓ Resources: No issues found
✓ Connectivity: No issues found
? Mariadb: Found possible issues
✓ Processes: No issues found
? Wordpress: Found possible issues
? Apache: Found possible issues
✓ Php: No issues found

[Mariadb]

Found recent error messages in the MariaDB error log:

2024-01-24 18:16:54 1124 [Warning] Aborted connection 1124 to db:
'bitnami_wordpress' user: 'bn_wordpress' host: '**ip_address**' (Got an error
reading communication packets)

Press [Enter] to continue: Please check the following guide to troubleshoot MariaDB issues:

https://docs.bitnami.com/aws/apps/wordpress/troubleshooting/debug-errors-m ariadb/

[Wordpress]

Found recent WordPress plugin related error messages in the Apache error log.

[Wed Jan 24 18:07:07.330872 2024] [autoindex:error] [pid 2415:tid
139896572258048] [client **ip_address**:48730] AH01276: Cannot serve directory
/opt/bitnami/wordpress/wp-content/plugins/woocommerce-payments/: No matching
DirectoryIndex (index.html,index.html,index.htm,index.php) found, and
server-generated directory index forbidden by Options directive

Please check the following guide to deactivate plugins:

https://developer.wordpress.org/cli/commands/plugin/deactivate/

[Apache]

Found recent error or warning messages in the Apache error log. Press [Enter] to continue:

[Wed Jan 24 18:16:55.544689 2024] [proxy_fcgi:error] [pid 2415:tid
139896731719424] [client **ip_address**:6868] AH01079: failed to make connection
to backend: httpd-UDS
 [Wed Jan 24 18:16:56.053325 2024] [proxy:error] [pid 3463:tid 139897629148928]
(2)No such file or directory: AH02454: FCGI: attempt to connect to Unix domain
socket /opt/bitnami/php/var/run/www.sock (www-fpm:8000) failed
 [Wed Jan 24 18:16:56.053364 2024] [proxy_fcgi:error] [pid 3463:tid
139897629148928] [client **ip_address**:6851] AH01079: failed to make connection
to backend: httpd-UDS

Please check the following guide to troubleshoot server issues:

https://docs.bitnami.com/general/apps/wordpress/troubleshooting/debug-erro rs-apache/

===== End of bndiagnostic tool output =====

bndiagnostic was not useful. Could you please tell us why?

Shows logged error, but doesn't explain what the issue is.

Describe your issue as much as you can

Hi Support,

I'm having an issue with my lightsail/wordpress ecommerce instance. It runs pretty smooth throughout the day and doesn't need any ctlrscript.sh restart, but later in the day the site will become completely unresponsive. Loading in the browser will just continuously load and seems to just hang. Even using ssh to connect to the server just hangs and never responds.

The only way to fix this issue is to do a hard shutdown, wait 3-5 minutes for the instance to stop, then to start the instance again. Using the reboot button in lightsail doesn't work. Sometimes immediately after the hard shut down, it will become unresponsive again and needs another hard shutdown.

I even tried to us the bitnami diagnostic tool to troubleshoot during this time and it just hangs as well... It took 3-4 times before I was able to successfully submit this diagnostic tool output.

To troubleshoot, I ran the top command and see that the mysql process is taking over 100% CPU, and in my lightsail instance the CPU is in the burstable CPU zone and above 30% use constantly. It seems like my CPU is running way to high and causing the issue, and I'm not sure why: Screenshot 2024-01-26-01

I'm currently using the $40 tier instance for my site, and it looks like I have free memory when the unresponsive instance occurs. The 8gb server should have plenty of resources so not sure what's the problem. I'm having issues troubleshooting what is going on, and this occurs daily now.

Any help would be appreciated as to what's going on.

DevMude commented 6 months ago

@varnals761

Try installing the Query Monitor plugin and navigating around different pages to see if there's any ineffecient queries taking a ridiculously long time to execute. You might need to install a caching plugin or use a cdn like Cloudflare to prevent these queries from being executed too frequently and consuming all cpu resources. Or optimize the queries yourself if you have coded them.

gongomgra commented 6 months ago

Hi @varnals761,

Thanks for using Bitnami. I'm afraid I don't know if this issue is related to burstable instances or not. We also reduced PHP memory settings recently in our images (see issues/927). We recommend to give updated settings a try. In case it doesn't work please open a new question in official WordPress forums, where more experienced developers may give you other hints on what may be happening.

Hope it helps!

varnals761 commented 6 months ago

Hello,

Thank you for the replies. I already have the reduced PHP settings in place but I'm currently using the small memory settings in my instance. If I have issues again, I'll go down to the micro memory settings. I also will apply the apache config updates in the link since I havent done that in 12+ months.

I'll also install that plugin and monitor the queries to see what's the cause.

gongomgra commented 6 months ago

Hi @varnals761,

Thanks for letting us know. I hope your issue gets solved with the updated values.

varnals761 commented 6 months ago

I got 1 response and it was to increase the server to the next tier.

I tried the new PHP settings but that didn't work, still getting unresponsive every couple hours. This is happening on a production server as well.

For the apache settings, are these optimal for the $40/month aws server?


#
# Note: This will be modified on server size changes

<IfModule mpm_prefork_module>
  StartServers    5
  MinSpareServers 5
  MaxSpareServers 10
  MaxRequestWorkers       5
  MaxConnectionsPerChild  5000
  KeepAliveTimeout 1
</IfModule>

<IfModule mpm_event_module>
  ServerLimit               4
  StartServers              2
  MinSpareThreads         128
  MaxSpareThreads         192
  ThreadsPerChild          64
  MaxRequestWorkers       256
  MaxConnectionsPerChild 5000
  KeepAliveTimeout          2
</IfModule>

<IfModule mod_passenger.c>
  PassengerMinInstances       1
  # PassengerMaxInstancesPerApp 1
  PassengerMaxPoolSize        3
</IfModule>```
gongomgra commented 5 months ago

Hi @varnals761,

Can you tell us which are the CPU and memory resources available in your instance? Apart from that, we recommend you to take a look to our performance troubleshooting guide. I hope you can find other misbehaving processes on your server, or that the issues are caused by long-running/consuming PHP processes caused by plugins and or custom themes.

https://docs.bitnami.com/aws/faq/troubleshooting/troubleshoot-server-performance/

varnals761 commented 5 months ago

Hi @gongomgra.

I have caching and I've tried all those in the link, but nothing is working.

I created a brand new "Test" lightsail instance with 8gb ram, 2vCPUs, and 160GB SSD - which is the same instance specs as my own - and manually went through several configurations to compare. In the /stack/php/etc folder, everything matched between my instance and the "Test" instance. I went through the apache configs and those also matched between my wordpress instance and the "Test" instance so no update was needed there.

Going through the mariadb conf folder there were discrepencies between my instance and the new "Test" instance. The my.cnf file had some additonal lines I removed to match the new "Test" instance, and changed the "character_set_server=utf8" to "character_set_server=utf8mb4". Additionally going into the /bitnami/memory folder, all the memory settings were slightly different. My instance had query_cache_limit and query_cache_type in place but the "Test" instance did not have those lines, so I removed. I also changed the query_cache_size from 256M in my instance to 0 that matched the "Test" instance. All the different memory-[server-size] files were different so I updated them to match the new "Test" instance.

After applying those changes, it looked like the site performed better for the day, but is still crashing every 4-6 hours in production.

I'm not sure if there are any other configurations I need to change, but the brand new "Test" instance settings should match my instance now for the 8gb ram, 2vCPUs, and 160GB SSD server. I am now able to connect via SSH all the time now, but my instance hangs still and takes minutes to load the homepage. The best fix is to do a hard shut down still, as a ctlscript.sh restart fixes for only minutes before becoming unresponsive.

I have a similar size store that's using the 2 GB RAM, 2 vCPUs, 60 GB SSD server and has been up for weeks now. They have the same plugins and are both using the Storefront By Automattic store theme from WooCommerce and links to woocommerce.com.

Wordpress support hasn't responded except to ask what the server specs are, and I'm at a loss for how to proceed. I need to find a way to get stability back in production, because we are losing customers every day and see a sharp decline in traffic and sales.

gongomgra commented 5 months ago

Hi @varnals761,

Thanks for your message. If you are using the same settings in all services in both instances but you still see bad perfomance with the old instance, I think it can be related to plugins and themes. Try to disable all plugins and themes and enable them one at a time, so you can detect which is affecting your server performance.

Another option (notice this is a long shot, and probably not related to the real issue), is that the physical AWS machine in which your instance is running is bad performing. If you are not having performance issues in the new instance, try to migrate your website there.

github-actions[bot] commented 5 months ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 5 months ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.