NethServer / dev

NethServer issue tracker
https://github.com/NethServer/dev/issues
63 stars 18 forks source link

Quick httpd restart #5816

Closed gsanchietti closed 5 years ago

gsanchietti commented 5 years ago

Nowadays there are many web applications using websocket and long-living connections. When a web application (eg. Nextcloud, WebTop) is updated, usually the httpd daemon is restarted.

On a production server with many connected users, most web applications remain unresponsive for around 1.5 minutes. This is due to how the httpd is restarted.

The systemd unit shipped with CentOS (/usr/lib/systemd/system/httpd.service) sends a WINCH signal to the httpd daemon on restart. As soon the Apache grab the signal, it doesn't accept any new connection and wait indefinitely until all remaining requests have been fully served (https://httpd.apache.org/docs/current/mod/mpm_common.html#gracefulshutdowntimeout).

Finally, if httpd has not been stopped after DefaultTimeoutStopSec seconds (default is 90s inside /etc/systemd/system.conf) systemd will send SIGCONT instead of SIGTERM.

Proposed solution

Change the default TimeoutStopSec for the httpd systemd unit to have a faster restart at the cost of losing some existing connections.

Most of modern web applications can handle re-connection and most of the old web application can handle a slow server response which can take around 5 or 10 seconds.

nethbot commented 5 years ago

in 7.6.1810/testing:

gsanchietti commented 5 years ago

Test case

nethbot commented 5 years ago

in 7.6.1810/testing:

nethbot commented 5 years ago

in 7.6.1810/testing:

filippocarletti commented 5 years ago

Before update:

[root@ns76-ent html]# curl localhost & time systemctl restart httpd
[1] 8475

curl: (52) Empty reply from server
[1]+  Exit 52                 curl localhost

real    1m30.196s
user    0m0.020s
sys 0m0.008s

After update:

[root@ns76-ent html]# curl localhost & time systemctl restart httpd
[1] 10527
curl: (52) Empty reply from server
[1]+  Exit 52                 curl localhost

real    0m5.201s
user    0m0.012s
sys 0m0.003s

FTR:

[root@ns76-ent html]# cat /var/www/html/index.php 
<?php
for ($x = 0; $x <= 100; $x++) {
    echo "The number is: $x <br>";
    sleep (1);
} 
?>
filippocarletti commented 5 years ago

Tested in production, no visible problems. Note that the 5 secs timeout has been reached.

[root@nethservice ~]# time systemctl restart httpd

real    0m5.437s
user    0m0.002s
sys 0m0.008s
stephdl commented 5 years ago

QA

# rpm -qa nethserver-httpd
nethserver-httpd-3.4.0-1.4.g7622a30.ns7.noarch
[root@ns7loc15 ~]# systemctl cat httpd.service

......

# /etc/systemd/system/httpd.service.d/quick_kill.conf
[Service]
TimeoutStopSec=5

The patch has modified the time to wait before to restart the service, before it could take a long time when you have several app like webtop and sogo installed on the same server, after the patch the server restart httpd really quickly

before the testing rpm

[root@ns7loc15 ~]# time systemctl restart httpd

real    1m30.587s
user    0m0.046s
sys 0m0.040s

after the testing rpm

[root@ns7loc15 ~]# time systemctl restart httpd

real    0m2.465s
user    0m0.020s
sys 0m0.067s
[root@ns7loc15 ~]# time systemctl restart httpd

real    0m4.403s
user    0m0.047s
sys 0m0.060s

I did not succeed to get a an error during the restart of httpd with webtop5, but I did succeed to get errors with SOGo, so obviously when httpd needs 90 seconds to restart, the httpd service is unavailable and SOGo also.

after the upgrade, all users can login to webtop or SOGo

proposed verified

nethbot commented 5 years ago

in 7.6.1810/testing:

nethbot commented 5 years ago

in 7.6.1810/updates: