NginxProxyManager / nginx-proxy-manager

Docker container for managing Nginx proxy hosts with a simple, powerful interface
https://nginxproxymanager.com
MIT License
23.2k stars 2.69k forks source link

Temp workaround (that works for me!) for SSL certificate renewal bug #2881

Open EDIflyer opened 1 year ago

EDIflyer commented 1 year ago

I know there are already lots of issues on this topic - I've tried to link to most of them below. I've just had to renew 16 sites on one server (running the latest v.2.10.2) and thought I'd go through the process that seemed to work reliably for me in case it helps others (with thanks to posters in other issues where I've gleaned this info from!).

Given how it works I suspect the issue is the the requests to the ACME endpoint not being allowed through when force SSL is enabled (as mentioned in some bug reports) and I'm hopeful @jc21 can merge in #2038 that seems to be an option (but is unfortunately now based off an older base).

Symptom image SSL certificates do not automatically renew and you receive a warning email from LetsEncrypt about an upcoming expiring certificate (typically I seem to get them when <20 days left to go). Attempts to manually review end up just showing an 'Internal server error'

Workaround Part 1 - clear any certbot.lock files I've found there is sometime an error caused by a a duplicate instance of CertBot running. You can check whether there are .certbot.lock files in your system:

find / -type f -name ".certbot.lock"

If there are, you can remove them:

find / -type f -name ".certbot.lock" -exec rm {} \;

(from https://community.letsencrypt.org/t/solved-another-instance-of-certbot-is-already-running/44690/2)

Part 2 - turn off Force SSL and then renew image After clearing any certbot lock, I then went through site by site and 1) disabled Force SSL on the proxy host page then 2) requested certificate renewal on the SSL page and then 3) re-enabled SSL and all sub-options back on the proxy host page.

As I say it takes a while and is frustrating but I found it worked reliably and they're all now renewed for the next 3 months. If you don't switch off Force SSL then you just end up with an internal error.

Related issues on this topic (in the hope that once this issue is resolved these can all be closed)

1771 #1816 #1856 #2048 #2251 #2258 #2267 #210 #2418 #2499 #2593 #2642 #2713 #2860

ririko5834 commented 1 year ago

It really works thanks. Certs were before normally renewed, found out that it stopped working when I updated to the latest version. When it will be fixed?

EDIflyer commented 1 year ago

Agree, @ririko5834 - it used to work fine for me but seemed to stop a few version ago. Hopefully the PR can be merged in to the latest codebase....

pd5rm commented 1 year ago

Just did this, thanks for the workaround writeup.

jhalak1984 commented 1 year ago

Hmm, for me, no such files were found, hence, didn't work for me

EDIflyer commented 1 year ago

@jc21 any word on when a fix might be coming for this SSL cert renewal issue? That's me having to manually renew another 10 sites this evening 😔

JohnnyLAmpAz commented 1 year ago

Hmm, for me, no such files were found, hence, didn't work for me

Neither I found any lock file but the trick disabling force SSL its the important part! Try anyway

EDIflyer commented 1 year ago

Hmm, for me, no such files were found, hence, didn't work for me

@jhalak1984 did you try part 2? The main bit seems to be force SSL not allowing an ACME exclusion, the first bit is just to ensure no conflicting certbot instances running.

EDIflyer commented 1 year ago

Thanks to the work from @the1ts in #2038 and the comments from @Whoopsadaisy re regex on that PR https://github.com/NginxProxyManager/nginx-proxy-manager/pull/2038#issuecomment-1372833078 I've created a new PR #3121 that combines their comments to stop /.well-known/acme-challenge requests from being redirected to https.

The new PR has been build (you can access it in a docker compose file by commenting out your current image and using image: 'jc21/nginx-proxy-manager:github-pr-3121' instead). The only change I made was to the one force-ssl-.conf file, but it is based off the current develop branch (2.10.4 as of today) so will include any other changes on there.

I've tried it on two servers that I run - on the first I was now enable to renew OK just by clicking 'renew now' on the SSL page (something that previously errored out). On the other one I initially still got the internal error but when I ran the first bit of the code in my OP above I found 3 certbot instances running so once I cleared them it seemed to renew OK. Out of interest I've only renewed one certificate on that server to see if the rest renew OK automatically. In both cases everything still seems to redirect to https OK and the regex seems to check out OK - (https://regex101.com/r/H58N25/1)

If you're happy to do so then please test it out - it is showing as OK to merge so if this merges hopefully it'll be accepted by @jc21 😃

PS - I checked back 10-15 min later and it seems that all the other certs have autorenewed too so that saved me quite a bit of work switching force SSL off/on for each one!

image

jhalak1984 commented 1 year ago

Awesome!! Works without a hitch now. Thank you!!!

Panoramiac commented 1 year ago

Thanks to the work from @the1ts in #2038 and the comments from @Whoopsadaisy re regex on that PR #2038 (comment) I've created a new PR #3121 that combines their comments to stop /.well-known/acme-challenge requests from being redirected to https.

The new PR has been build (you can access it in a docker compose file by commenting out your current image and using image: 'jc21/nginx-proxy-manager:github-pr-3121' instead). The only change I made was to the one force-ssl-.conf file, but it is based off the current develop branch (2.10.4 as of today) so will include any other changes on there.

I've tried it on two servers that I run - on the first I was now enable to renew OK just by clicking 'renew now' on the SSL page (something that previously errored out). On the other one I initially still got the internal error but when I ran the first bit of the code in my OP above I found 3 certbot instances running so once I cleared them it seemed to renew OK. Out of interest I've only renewed one certificate on that server to see if the rest renew OK automatically. In both cases everything still seems to redirect to https OK and the regex seems to check out OK - (https://regex101.com/r/H58N25/1)

If you're happy to do so then please test it out - it is showing as OK to merge so if this merges hopefully it'll be accepted by @jc21 😃

PS - I checked back 10-15 min later and it seems that all the other certs have autorenewed too so that saved me quite a bit of work switching force SSL off/on for each one!

image

So what do I need to do to get this working with Nginx Proxy Manager Addon running on HomeAssisant? I got yesterday the mails that my certs will expire soon. I do not know what went wrong, but I can not renew them and currently, the certs are also not accepted anymore by the Android App (I guess I messed something up by adding my subdomains to the DuckDns Addon).

EDIflyer commented 1 year ago

@Panoramiac sorry I'm running it on VPS with Docker/Portainer so can easily specify a different image to use - I'm not 100% sure re Home Assistant how to do that (I do run HA but only on my NAS on my home network) - might be worth asking in the HA forums if someone knows how to specify a different image to use?

mtojay commented 1 year ago

hmm for me it does not work. received a mail about expiring certs. went and try to renew. didnt work. looked up on the internet and found this workaround. but for me it does not work. there are not certbot.lock files in my docker container. and disabling ssh also does not do anything. tried rebooting the container, the vps and all steps here. its still always: "internal error". not quite sure where to go from here.

EDIflyer commented 1 year ago

@mtojay did you try the different version in the PR I submitted? I've been running it since I created it and all auto-renewals going through fine for me.

mtojay commented 1 year ago

thx for your answer @EDIflyer, but yeah i tried taht. i pulled the docker image with your PR, but i still get "Internal Error". After recreating the container with the new image i tried looking for locked Certbots again, but i dont have any locked certbot instances. I dont know what or if am doing wrong, but i cant get new ceritificates no matter how often i try what had been suggested here in this thread.

I probably have an unrelated issue. If my certs expire in t he coming days i will retry again.

EDIflyer commented 1 year ago

Ah OK, sorry to hear that @mtojay. At least you've ruled out locked Certbots. Is there anything more in the NPM logs that you can see when you try to renew and get the error? The issue at https://github.com/NginxProxyManager/nginx-proxy-manager/issues/1816 goes through some of what others found. Sorry I can't help more!

fhazal commented 1 year ago

this workaround is not working for me too, i change the image to image: 'jc21/nginx-proxy-manager:github-pr-3121' still can't get it work, i follow the instruction and delete and reinstall NPM still can't create SSL cert or renew the cert, please help.

EDIflyer commented 1 year ago

Hmm weird, I'm still using that one and it works OK. Does it pull the image down OK?

fhazal commented 1 year ago

Hmm weird, I'm still using that one and it works OK. Does it pull the image down OK?

yup it did pull the image without any error.

EDIflyer commented 1 year ago

OK - I presume port 80 is open on your firewall to allow the certbot requests to get through to the acme endpoint? I'm afraid I don't have many other ideas!

broetchenrackete36 commented 1 year ago

PR #3121 worked great for me. Finally my certs are renewed automagically again, thx :)

peterge1998 commented 1 year ago

I guess I experience the same problem, the certs aren't renewing in my instance too. docker logs gives this:

[11/10/2023] [7:55:11 AM] [SSL      ] › ✖  error     Error: Command failed: certbot renew --non-interactive --quiet --config "/etc/letsencrypt.ini" --work-dir "/tmp/letsencrypt-lib" --logs-dir "/tmp/letsencrypt-log" --preferred-challenges "dns,http" --disable-hook-validation  
Another instance of Certbot is already running.

    at ChildProcess.exithandler (node:child_process:402:12)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)

I guess a fix will be released in near future?

EDIflyer commented 1 year ago

@peterge1998 it's been like this for months so sadly I'm not sure a fix is imminent. Did you try the code posted above re duplicate certbot instances or running the PR version I created? Worth a shot but no guarantees it'll help I'm afraid!

peterge1998 commented 1 year ago

@peterge1998 it's been like this for months so sadly I'm not sure a fix is imminent. Did you try the code posted above re duplicate certbot instances or running the PR version I created? Worth a shot but no guarantees it'll help I'm afraid!

How can I run the PR version with docker?

EDIflyer commented 1 year ago

How can I run the PR version with docker? If you replace the normal image with image: 'jc21/nginx-proxy-manager:github-pr-3121' in your docker compose (or tweak appropriately for docker run) command then that should do the trick.

peterge1998 commented 1 year ago

How can I run the PR version with docker? If you replace the normal image with image: 'jc21/nginx-proxy-manager:github-pr-3121' in your docker compose (or tweak appropriately for docker run) command then that should do the trick.

I get this error with the image of your pr now when renewing certs:

[11/10/2023] [7:21:42 PM] [SSL      ] › ℹ  info      Command: certbot renew --force-renewal --config "/etc/letsencrypt.ini" --work-dir "/tmp/letsencrypt-lib" --logs-dir "/tmp/letsencrypt-log" --cert-name "npm-14" --preferred-challenges "dns,http" --no-random-sleep-on-renew --disable-hook-validation 
[11/10/2023] [7:21:42 PM] [Express  ] › ⚠  warning   Command failed: certbot renew --force-renewal --config "/etc/letsencrypt.ini" --work-dir "/tmp/letsencrypt-lib" --logs-dir "/tmp/letsencrypt-log" --cert-name "npm-14" --preferred-challenges "dns,http" --no-random-sleep-on-renew --disable-hook-validation 
Another instance of Certbot is already running.
Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /tmp/certbot-log-9kceqaqh/log or re-run Certbot with -v for more details.

Okay, its still the same:

[11/10/2023] [7:23:16 PM] [SSL      ] › ✖  error     Error: Command failed: certbot renew --non-interactive --quiet --config "/etc/letsencrypt.ini" --work-dir "/tmp/letsencrypt-lib" --logs-dir "/tmp/letsencrypt-log" --preferred-challenges "dns,http" --disable-hook-validation  
Failed to renew certificate npm-8 with error: Some challenges have failed.
The following renewals failed:
  /etc/letsencrypt/live/npm-8/fullchain.pem (failure)
1 renew failure(s), 0 parse failure(s)

    at ChildProcess.exithandler (node:child_process:402:12)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
peterge1998 commented 1 year ago

Deleting the host and adding it again works...

EDIflyer commented 1 year ago

Damn, sorry to hear that. The weird thing is it's still working OK on mine with that PR version. I assume port 80 is open to allow challenge requests through? Did you run the command in the container to clear the other certbot instances too? I'm afraid I'm out of ideas after that!

EDIT: just seen your followup post, that sounds slightly more encouraging and at least confirms the challenge requests can get through OK!

peterge1998 commented 11 months ago

Deleting the host and adding it again works...

I am facing this problem again. This time I would like to add a ssl certificate using the dns provider. I am not able to follow my own suggestion how to fix this because there is no host this time :0

Please fix this asap!

sunsreddit commented 10 months ago

+1'ing.

Made the manual change this PR makes in my own setup and can confirm it fixes my issue.

Thank you, @EDIflyer

EcksDy commented 5 months ago

For me it was a combination of the following that caused the error:

Once I've whitelisted my current IP, the renewal worked even with all of my hosts having "Force SSL" enabled.

picode7 commented 2 months ago

Turning off "Force SSL" didn't work at first.

I found out that I had reached the validation limit

All issuance requests are subject to a Failed Validation limit of 5 failures per account, per hostname, per hour (using a sliding window). After waiting for an hour, it worked.

Btw. I couldn't find/delete any ".certbot.lock" files in the container.

HAEdwin commented 3 weeks ago

What worked for me was to temporary add a forward rule on the router to enable port 80 traffic to the website. Because apparently there were problems with the accessibility of the website from Let's Encrypt. After the refresh I turned off the forward rule to port 80 again.