froxlor / Froxlor

The server administration software for your needs - The official Froxlor development Git repository
http://www.froxlor.org
GNU General Public License v2.0
1.62k stars 455 forks source link

cron task REBUILD_VHOST loop #1249

Closed ZARk-be closed 1 month ago

ZARk-be commented 4 months ago

Describe the bug The rebuild vhost task would run every 5 minutes again and again. When checking in the table panel_tasks i could see the task deleted but a new one added immediately....

System information

To Reproduce Steps to reproduce the behavior:

Expected behavior An error message logged about the error, as to have an idea why it's looping. Something like : Froxlor Host lets encrypt certificate needs to rebuild, forcing another web server config task

Logfiles

[file] => /var/www/froxlor/lib/Froxlor/Cron/Http/Apache.php
            [line] => 138
            [function] => inserttask
            [class] => Froxlor\System\Cronjob
            [type] => ::
            [args] => Array
                (
                    [0] => 1
                )

Additional context I know the error is due to a configuration error, but it was hard to find what was responsible for this. ( tracked down the method that inserted a task in db, and added a debug trace).

d00p commented 4 months ago

You've enabled let's encrypt and "ssl redirect" for the froxlor vhost. This requires two cron-runs. The first should output Skipping Let's Encrypt generation for DOMAIN due to an enabled ssl_redirect and then it inserts another task to generate the certificate itself and regenerate the vhost. Please manually run the cronjob with --debug to gain more information

ZARk-be commented 4 months ago

first run :

Checking froxlor file permissions...OK
Running "tasks" job (debug)
[information] TasksCron: Searching for tasks to do
[information] Dkim-milter reloaded
[information] Task4 started - Rebuilding froxlor_bind.conf
[information] Cleaning dns zone files from /etc/bind/domains/
.... domains ....
[information] froxlor_bind.conf written
[information] Bind daemon reloaded
[information] Task4 finished
[information] Running Let's Encrypt cronjob prior to regenerating webserver config files
[information] Checking for LetsEncrypt client upgrades before renewing certificates:
[Thu 11 Apr 08:31:29 CEST 2024] Already uptodate!
[Thu 11 Apr 08:31:29 CEST 2024] Upgrade success!
[Thu 11 Apr 08:31:29 CEST 2024] Installing cron job
3 0 * * * "/root/.acme.sh"/acme.sh --cron --home "/root/.acme.sh" > /dev/null
[Thu 11 Apr 08:31:30 CEST 2024] Changed default CA to: https://acme-v02.api.letsencrypt.org/directory
[information] No new certificates or certificate updates found
[information] apache::createIpPort: creating ip/port settings for  <ip redacted>:80
[debug] <ip redacted>:80 :: inserted listen-statement
[debug] <ip redacted>:80 :: inserted vhostcontainer
[information] apache::createIpPort: creating ip/port settings for  <ip redacted>:443
[debug] <ip redacted>:443 :: inserted listen-statement
[debug] <ip redacted>:443 :: inserted vhostcontainer
... vhosts ...
[information] apache::writeConfigs: rebuilding /etc/apache2/vhosts.d/diroptions.conf
[information] apache::writeConfigs: rebuilding /etc/apache2/vhosts.d/htpasswd/
[information] apache::writeConfigs: rebuilding /etc/apache2/vhosts.d/vhosts.conf
[information] Froxlor\Cron\Http\ApacheFcgi::reload: running /etc/init.d/php-fpm-php5.6 reload
[information] Froxlor\Cron\Http\ApacheFcgi::reload: running /etc/init.d/php-fpm-php7.4 reload
[information] Froxlor\Cron\Http\ApacheFcgi::reload: running /etc/init.d/php-fpm-php8.1 reload
[information] Froxlor\Cron\Http\ApacheFcgi::reload: reloading Froxlor\Cron\Http\ApacheFcgi
[notice] Checking system's last guid
[notice] Checking cron-d
[notice] Cleaning old login links

run 2 ( the task created by run 1 )

Checking froxlor file permissions...OK
Running "tasks" job (debug)
[information] TasksCron: Searching for tasks to do
[information] Running Let's Encrypt cronjob prior to regenerating webserver config files
[information] Checking for LetsEncrypt client upgrades before renewing certificates:
[Thu 11 Apr 08:31:38 CEST 2024] Already uptodate!
[Thu 11 Apr 08:31:38 CEST 2024] Upgrade success!
[Thu 11 Apr 08:31:38 CEST 2024] Installing cron job
3 0 * * * "/root/.acme.sh"/acme.sh --cron --home "/root/.acme.sh" > /dev/null
[Thu 11 Apr 08:31:38 CEST 2024] Changed default CA to: https://acme-v02.api.letsencrypt.org/directory
[information] No new certificates or certificate updates found
[information] apache::createIpPort: creating ip/port settings for  <ip redacted>:80
[debug] <ip redacted>:80 :: inserted listen-statement
[debug] <ip redacted>:80 :: inserted vhostcontainer
[information] apache::createIpPort: creating ip/port settings for  <ip redacted>:443
[debug] <ip redacted>:443 :: inserted listen-statement
[debug] <ip redacted>:443 :: inserted vhostcontainer
...vhosts...
[information] apache::writeConfigs: rebuilding /etc/apache2/vhosts.d/diroptions.conf
[information] apache::writeConfigs: rebuilding /etc/apache2/vhosts.d/htpasswd/
[information] apache::writeConfigs: rebuilding /etc/apache2/vhosts.d/vhosts.conf
[information] Froxlor\Cron\Http\ApacheFcgi::reload: running /etc/init.d/php-fpm-php5.6 reload
[information] Froxlor\Cron\Http\ApacheFcgi::reload: running /etc/init.d/php-fpm-php7.4 reload
[information] Froxlor\Cron\Http\ApacheFcgi::reload: running /etc/init.d/php-fpm-php8.1 reload
[information] Froxlor\Cron\Http\ApacheFcgi::reload: reloading Froxlor\Cron\Http\ApacheFcgi
[notice] Checking system's last guid
[notice] Checking cron-d
[notice] Cleaning old login links
d00p commented 4 months ago

well these logs are not for the original cause...You need to recreate the same scenario. Disable Let's encrypt and ssl-redirect and reenable both, then run cron manually twice. Best would be to stop crond for the time testing and manually trigger the cron so you get all the output and can identify potential issues

ZARk-be commented 4 months ago

that's exactly what i did. i disabled them yesterday. enabled them this morning to get those logs out. and re-disabled them again now. I just did it again to make sure, and exact same result.

---- an hour passes of looking around ---

I looked deeper. I see that i never have a domainid =0 in domain_ssl_settings table . I deleted the folder /root/acme.sh/ and ran cron with --force to force everything to run. It re-created the certificate and correctly inserted domainid=0 and now everything is set-up correctly . vhost file uses the LE cert. and vhost generation is now normal again.

it seems like it expected to have the table entry if the files exist ? Maybe consider the table as a source of truth ?

d00p commented 4 months ago

an entry only exists if there is a let's encrypt certificate already issued (used for renew).

ZARk-be commented 4 months ago

looking at the code

AcmeSh.php / issueFroxlorVhost()

If the row is not present and the folder exists and is recent ( which was my case ). Then issueFroxlorVhost() returns false, so the certificate is not updated, therefore the row is never created.

Now the question is how did i have the certificate in acme configured without having the row in froxlor ... but somehow it happened.

d00p commented 4 months ago

If the row is not present and the folder exists and is recent ( which was my case ). Then issueFroxlorVhost() returns false, so the certificate is not updated, therefore the row is never created.

issueFroxlorVhost() != renewFroxlorVhost()

issue checks if there is already something and won't issue a new certificate if so. Everything else regarding renew is handled by acme.sh and froxlor will only read-in renewed certifcate files if they are newer than what we know.

Again: issueFroxlorVhost() does not do the renewing/updating. If anything exists, there is no need for an issue as it has already been called (and a corresponding entry in domain_ssl_settings should exist.

Now the question is how did i have the certificate in acme configured without having the row in froxlor ... but somehow it happened.

Yes, that's the issue, this state should not occur, can't tell you how it did. Never had any issues with this