Auth hook fails because of missing configuration file?

JaneJeon commented 2 years ago

Hi, I've been using dnsrobocert with no problem, but recently it has been failing to actually run the auth, not because of misconfiguration or incorrect DNS, but because of some... missing temp file??

Renewing an existing certificate for $site
Hook '--manual-auth-hook' for $site reported error code 1
Hook '--manual-auth-hook' for $site ran with error output:
 2022-03-26 02:31:37 fee66d534ec0 dnsrobocert.core.config[50] ERROR Configuration file /tmp/tmpa18ssoa9/dnsrobocert-runtime.yml does not exist.
 Error occured while loading the configuration file, aborting the `auth` hook.

And since the auth hook fails, the cert renewal fails... It's been working fine before this, any ideas?

Grokon commented 2 years ago

I have this error too. It can be reproduced steps:

Start docker container and issue certificate
In folder /etc/letsencrypt/renewal/ it will be created config file with parameter deploy -c "/tmp/tmp1w7449xr/dnsrobocert-runtime.yml
Restart docker container and dnsrobocert will not be update the certificate because time has not come. But tmp config path will be recreated and path is changed.
check config file in /etc/letsencrypt/renewal/domain.com.conf and we will see that parameter to tmp file is not changed.
When it's time to update certificate, it's used the old file path, and we have error: ERROR Configuration file /tmp/tmpa18ssoa9/dnsrobocert-runtime.yml does not exist.

After this the certificate is not issued and we have error:


Hook '--manual-cleanup-hook' for domain.com ran with error output:
2022-07-12 17:41:25 server-host dnsrobocert.core.config[84] ERROR Configuration file /tmp/tmpakp5917q/dnsrobocert-runtime.yml does not exist.
Error occured while loading the configuration file, aborting the `cleanup` hook.
Failed to renew certificate domain.com with error: Some challenges have failed.

All renewals failed. The following certificates could not be renewed: /etc/letsencrypt/live/domain.com/fullchain.pem (failure)
1 renew failure(s), 0 parse failure(s) Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /etc/letsencrypt/logs/letsencrypt.log or re-run Certbot with -v for more details.

Exception in thread Thread-1: Traceback (most recent call last): File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner self.run() File "/usr/local/lib/python3.9/site-packages/dnsrobocert/core/background.py", line 48, in run schedule.run_pending() File "/usr/local/lib/python3.9/site-packages/schedule/init.py", line 780, in run_pending default_scheduler.run_pending() File "/usr/local/lib/python3.9/site-packages/schedule/init.py", line 100, in run_pending self._run_job(job) File "/usr/local/lib/python3.9/site-packages/schedule/init.py", line 172, in _run_job ret = job.run() File "/usr/local/lib/python3.9/site-packages/schedule/init.py", line 661, in run ret = self.job_func() File "/usr/local/lib/python3.9/site-packages/dnsrobocert/core/background.py", line 70, in _renew_job certbot.renew(config_path, directory_path, lock) File "/usr/local/lib/python3.9/site-packages/dnsrobocert/core/certbot.py", line 127, in renew utils.execute( File "/usr/local/lib/python3.9/site-packages/dnsrobocert/core/utils.py", line 60, in execute raise error File "/usr/local/lib/python3.9/site-packages/dnsrobocert/core/utils.py", line 50, in execute call(command, shell=shell, env=env) File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/local/bin/python3', '-m', 'dnsrobocert.core.certbot', 'renew', '-n', '--user-agent-comment', 'DNSroboCert/3.20.1', '--preferred-chain', 'ISRG Root X1', '--config-dir', '/etc/letsencrypt', '--deploy-hook', '/usr/local/bin/python3 -m dnsrobocert.core.hooks -t deploy -c "/tmp/tmpfjpcs60w/dnsrobocert-runtime.yml"', '--work-dir', '/etc/letsencrypt/workdir', '--logs-dir', '/etc/letsencrypt/logs']' returned non-zero exit status 1.
After this dnsrobocert process is stuck (not exit with error)
And docker doesn't restart the container.

So, we always need to restart docker container manually.

P.S. can you add docker heath check for same errors?

JaneJeon commented 2 years ago

Running into the EXACT same problem again...

Vertganti commented 2 years ago

As stated here we have been experiencing this since version 3.14.0, which fixed a different renewal issue.

As far as I can tell the problem is that the initial certonly call specifies auth, cleanup and deploy hooks in the created temporary directory using the config_path parameter. This first renewal attempt after restart therefore always succeeds. All follow-up renew calls only specify the deploy hook using the config_path parameter. The certbot renew command does not support manual execution, so the manual cleanup and auth hooks cannot be specified using parameters and will always be taken from the renewal configuration when using that command. Since the renewal file is located in the LetsEncrypt directory which is mounted outside the container, it will persist between container restarts. As @Grokon mentioned this will cause subsequent renewals to use the temporary directory path created by the very first certificate request, which does not exist anymore once the container has been restarted.

I see three possible solutions for the issue (note that I have not tested any of these):

Always delete the renewal configuration when the containers are stopped (could be done manually as a workaround too)
Always update the renewal configuration when a new temporary configuration directory is created (= on container start). I would actually have expected the certonly call after startup to do this, but it seems it does not?
Use the same certonly command for all renewal attempts as is used for the initial requests/renewal attempt after restart. This is the official way to renew when using the manual plugin.

centja1 commented 1 year ago

I made the change described by @Vertganti to make the same certonly call on renewal as it does when the docker container starts.

I had a few certs nearing renewal and have tested it successfully, but wouldn't mind a couple more confirmations prior to submitting at PR.

I pushed a docker image to justincentanni/dnsrobocert:certonly and have the code in https://github.com/centja1/dnsrobocert/tree/call-certonly-every-time

Vertganti commented 1 year ago

Thanks @centja1 for adapting the code and providing the image. I have set it up for testing with a certificate that will expire towards the end of this month and will report if it worked then.

Vertganti commented 1 year ago

The renewal worked! Since no one else is responding I guess you can submit the PR. Hopefully @adferrand will be back and able to merge/review it soon.

Codelica commented 1 year ago

This should really be considered. On stable/production systems the problem hits regularly without something like a cron job to restart the container every so often. I think a lot of homelab users just don't notice, as the container/machine is restarted more frequently.

adferrand / dnsrobocert

Auth hook fails because of missing configuration file? #730

1 renew failure(s), 0 parse failure(s) Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /etc/letsencrypt/logs/letsencrypt.log or re-run Certbot with -v for more details.