CausticLab / rgon-proxy

the base image of the rancher nginx-letsencrypt proxy
5 stars 1 forks source link

Spamming auth attempts gets you locked out of LetsEncrypt API #44

Open Ramblurr opened 7 years ago

Ramblurr commented 7 years ago

For some reason cert creation is failing, and the tool ends up in a loop where it spams authorization attempts and quickly gets locked out due to the rate limit:

We recently (April 2017) introduced a Failed Validation limit of 5 failures per account, per hostname, per hour. source

I see this in the log file repeated hundreds of times:

INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "20170720143655 [CRITICAL] acmetool: fatal: reconcile: the following errors occurred:"
INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "error satisfying Target(sub2.mydomain.example;https://acme-v01.api.letsencrypt.org/directory;0): HTTP error: 429 Too Many Requests"
INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "map[Replay-Nonce:[UoktS6H4aoW-tHxtQIjLjfwC3rdTVUMW239LI6NASI8] Pragma:[no-cache] Date:[Thu, 20 Jul 2017 14:36:55 GMT] Content-Type:[application/problem+json] Content-Length:[144] Boulder-Requester:[18899316] Expires:[Thu, 20 Jul 2017 14:36:55 GMT] Cache-Control:[max-age=0, no-cache, no-store] Server:[nginx] Boulder-Request-Id:[rtPgqMbmtVa4FYHzmPywYtDjYl6K6vkg3tD2ruDTDIA]]"
INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "{"
INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "  \"type\": \"urn:acme:error:rateLimited\","
INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "  \"detail\": \"Error creating new authz :: Too many invalid authorizations recently.\","
INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "  \"status\": 429"
INFO[0517] [acmetool want $(echo "sub1.mydomain.example" | tr , " ")]: "}"

Why exactly it is failing, I'm not sure. What's interesting is that it seems to confuse sub1.mydomain.example and sub2.mydomain.example.

Munsio commented 7 years ago

can you append the log entries before too?

emcniece commented 7 years ago

2 quick fixes: there is an option during stack creation for using the Acme staging server, and you can also change your email address (Gmail addresses support + append, like myemail+whateveryouwant@gmail.com) to start generating certs again.

This doesn't solve any problems though.

emcniece commented 7 years ago

@Ramblurr seconding @Munsio's request - it would be useful to see the logs before the rate limits happen, when certificate requests are (likely) failing for other reasons.

Ramblurr commented 7 years ago

Unfortunately I don't have the logs directly proceeding as I removed the config dir and booted a fresh container.. but here is an example from further up in the logs.

Notably, again, it is trying to authorize FOO but somehow find BAR?

edit: is it possible to trigger a renew from the command line?

DEBU[0000] Parsing: FOO-FOO-1
DEBU[0000] Running check command '[ -d /etc/nginx/certs/$(echo "FOO.mydomain.example" | cut -d"," -f 1) ] && exit 1 || exit 0'
INFO[0000] Executing notify command 'acmetool want $(echo "FOO.mydomain.example" | tr , " ")'
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "20170720142252 [ERROR] acme.storageops: could not obtain authorization for BAR.mydomain.example: failed all combinations"
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "20170720142252 [ERROR] acme.storageops: Target(BAR.mydomain.example;https://acme-v01.api.letsencrypt.org/directory;0): failed to request certificate: failed all combinations"
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "20170720142252 [ERROR] acme.storageops: error while processing targets: the following errors occurred:"
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "error satisfying Target(BAR.mydomain.example;https://acme-v01.api.letsencrypt.org/directory;0): failed all combinations"
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "20170720142252 [ERROR] acme.storageops: failed to reconcile: the following errors occurred:"
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "error satisfying Target(BAR.mydomain.example;https://acme-v01.api.letsencrypt.org/directory;0): failed all combinations"
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "20170720142252 [CRITICAL] acmetool: fatal: reconcile: the following errors occurred:"
INFO[0007] [acmetool want $(echo "FOO.mydomain.example" | tr , " ")]: "error satisfying Target(BAR.mydomain.example;https://acme-v01.api.letsencrypt.org/directory;0): failed all combinations"
Ramblurr commented 7 years ago

Here's some more logs. Right before this I restarted the rgon service:

INFO[1082] Exit requested by signal: terminated
/etc/nginx/certs/default/default.pass.key: No such file or directory
140621314571148:error:02001002:system library:fopen:No such file or directory:bss_file.c:402:fopen('/etc/nginx/certs/default/default.pass.key','w')
140621314571148:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:404:
Error opening Private Key /etc/nginx/certs/default/default.pass.key
140036572375948:error:02001002:system library:fopen:No such file or directory:bss_file.c:402:fopen('/etc/nginx/certs/default/default.pass.key','r')
140036572375948:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:404:
unable to load Private Key
rm: can't remove '/etc/nginx/certs/default/default.pass.key': No such file or directory
Error opening Private Key /etc/nginx/certs/default/default.key
140061420919692:error:02001002:system library:fopen:No such file or directory:bss_file.c:402:fopen('/etc/nginx/certs/default/default.key','r')
140061420919692:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:404:
unable to load Private Key
/etc/nginx/certs/default/default.csr: No such file or directory
 100.00% 0s  .00%
20170721085253 [WARN] acmetool: Don't know how to install a cron job on this system, please install the following job:
SHELL=/bin/sh
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
MAILTO=root
17 12 * * * root /usr/local/bin/acmetool --batch reconcile

------------------------- Quickstart Complete ----------------------
The quickstart process is complete.

Ensure your chosen challenge conveyance method is configured properly
before attempting to request certificates. You can find more
information about how to configure your system for each method in the
acmetool documentation:
https://github.com/hlandau/acme/blob/master/_doc/WSCONFIG.md

To request a certificate, run:

$ sudo acmetool want example.com www.example.com

If the certificate is successfully obtained, it will be placed in
/var/lib/acme/live/example.com/{cert,chain,fullchain,privkey}.

[ENTRYPOINT]: Running Rancher-Gen first-run
INFO[0000] Starting rancher-gen v0.6.0 (ee2ce5c)
INFO[0000] Initializing Rancher Metadata client (version 2015-12-19)
INFO[0000] Processing all templates once.
DEBU[0000] Checking for metadata change
DEBU[0000] Old version: init, New Version: "17778-4f3c5c96fb170da7fa8781d4ac55192c"
DEBU[0000] Fetching Metadata
DEBU[0000] Processing template /etc/rancher-gen/default/nginx.tmpl for destination /etc/nginx/conf.d/nginx.conf
DEBU[0000] Checking whether content has changed
DEBU[0000] Checksum content: 36420ec4669aacfd38b19cc1ef23e2c9, checksum file:
DEBU[0000] Creating staging file
DEBU[0000] Created staging file /etc/nginx/conf.d/.nginx.conf-221671822
DEBU[0000] Copying file permissions and owner from destination
DEBU[0000] Writing destination
INFO[0000] Destination file has been updated: /etc/nginx/conf.d/nginx.conf
DEBU[0000] Notifying label 'rgon-proxy' with value 'nginx'
DEBU[0000] Fetching Metadata
DEBU[0000] NOTIFY: rgon-proxy-nginx-1 :: [rgon-proxy:nginx]
DEBU[0000] Parsing: rgon-proxy-nginx-1
INFO[0000] Executing notify command 'rgon-exec -name=rgon-proxy-nginx-1 -cmd="service nginx reload"'
INFO[0000] [rgon-exec -name=rgon-proxy-nginx-1 -cmd="service nginx reload"]: "Executing [service nginx reload] on container [rgon-proxy-nginx-1]"
INFO[0000] [rgon-exec -name=rgon-proxy-nginx-1 -cmd="service nginx reload"]: "[....] Reloading nginx: nginx\x1b[?25l\x1b7\x1b[1G[\x1b[32m ok \x1b[39;49m\x1b8\x1b[?12l\x1b[?25h.\r"
INFO[0000] [rgon-exec -name=rgon-proxy-nginx-1 -cmd="service nginx reload"]: "websocket: close 1000 (normal)"
DEBU[0000] Notify cmd output: "Executing [service nginx reload] on container [rgon-proxy-nginx-1]\n[....] Reloading nginx: nginx\x1b[?25l\x1b7\x1b[1G[\x1b[32m ok \x1b[39;49m\x1b8\x1b[?12l\x1b[?25h.\r\nwebsocket: close 1000 (normal)\n"
INFO[0000] All templates processed. Exiting.
[ENTRYPOINT]: Rancher-Gen first-run complete
INFO[0000] Starting rancher-gen v0.6.0 (ee2ce5c)
INFO[0000] Initializing Rancher Metadata client (version 2015-12-19)
INFO[0000] Polling Metadata with %d second interval30
DEBU[0000] Checking for metadata change
DEBU[0000] Old version: init, New Version: "17778-4f3c5c96fb170da7fa8781d4ac55192c"
DEBU[0000] Fetching Metadata
DEBU[0000] No template - processing commands
DEBU[0000] Notifying label 'rgon.ssl'
DEBU[0000] Fetching Metadata
DEBU[0000] NOTIFY: KLAM-KLAM3-1 :: [rgon.ssl:true]
DEBU[0000] NOTIFY: FOO-test-FOO-frontend-1 :: [rgon.ssl:true]
DEBU[0000] NOTIFY: FOO-FOO-frontend-1 :: [rgon.ssl:true]
DEBU[0000] NOTIFY: BAR-mydomain.example-BAR-1 :: [rgon.ssl:true]
DEBU[0000] NOTIFY: DRY-DRY-1 :: [rgon.ssl:true]
DEBU[0000] Parsing: KLAM-KLAM3-1
DEBU[0000] Running check command '[ -d /etc/nginx/certs/$(echo "KLAM.mydomain.example" | cut -d"," -f 1) ] && exit 1 || exit 0'
INFO[0000] Check failed, skipping notify-cmd
DEBU[0000] Parsing: FOO-test-FOO-frontend-1
DEBU[0000] Running check command '[ -d /etc/nginx/certs/$(echo "FOO-test.mydomain.example" | cut -d"," -f 1) ] && exit 1 || exit 0'
INFO[0000] Check failed, skipping notify-cmd
DEBU[0000] Parsing: FOO-FOO-frontend-1
DEBU[0000] Running check command '[ -d /etc/nginx/certs/$(echo "FOO.mydomain.example" | cut -d"," -f 1) ] && exit 1 || exit 0'
INFO[0000] Check failed, skipping notify-cmd
DEBU[0000] Parsing: BAR-mydomain.example-BAR-1
DEBU[0000] Running check command '[ -d /etc/nginx/certs/$(echo "BAR.mydomain.example" | cut -d"," -f 1) ] && exit 1 || exit 0'
INFO[0000] Check failed, skipping notify-cmd
DEBU[0000] Parsing: DRY-DRY-1
DEBU[0000] Running check command '[ -d /etc/nginx/certs/$(echo "DRY.mydomain.example" | cut -d"," -f 1) ] && exit 1 || exit 0'
INFO[0000] Check failed, skipping notify-cmd
DEBU[0000] Processing template /etc/rancher-gen/default/nginx.tmpl for destination /etc/nginx/conf.d/nginx.conf
DEBU[0000] Checking whether content has changed
DEBU[0000] Checksum content: 36420ec4669aacfd38b19cc1ef23e2c9, checksum file: 36420ec4669aacfd38b19cc1ef23e2c9
DEBU[0000] Destination /etc/nginx/conf.d/nginx.conf is up to date
INFO[0000] All templates processed. Waiting for changes in Metadata...
DEBU[0030] Checking for metadata change
DEBU[0030] No changes in Metadata
DEBU[0060] Checking for metadata change
DEBU[0060] No changes in Metadata
Munsio commented 7 years ago

Sry if it sounds silly but did you obfuscate the logs by changing the real domains to those mydomain.example?

Also before you changed to dev branch did you remove the config folder except your customized one?

Next question - where there already functional letsencrypt certificates for the domains you tried to create one after you switched to dev branch?

Also what could be helpful is sending us the genrated nginx.conf we have an discord server where you can send us logs/configs in private. https://discord.gg/EeBjSr5

Currently we also need to be able to expose port 402 for the acmetool webserver to verify the domains

-- something you could try -- Turn off ssl-generation on the containers by setting rgon.ssl to off and restart rgon service before trying the below.

Exec into container and running "acmetool cull --simulate" if there is some output post it here. Also if you are brave enough you can run it without --simulate to remove old/unused certificates

Exec into container and running "acmetool revoke cert-path" - didn't tried this by myself so i dont know what you need as path - but with that you are revoking the "old" valid certificate and maybe be able to generate it new.

Turn on the ssl labels again and check if acmetool is possible to re-/generate the certificates

Ramblurr commented 7 years ago

Sry if it sounds silly but did you obfuscate the logs by changing the real domains to those mydomain.example?

Yes I did :) on my actual system they are all actual, functioning domains.

Also before you changed to dev branch did you remove the config folder except your customized one? Next question - where there already functional letsencrypt certificates for the domains you tried to create one after you switched to dev branch?

I deleted the configs, but left the certs.

Currently we also need to be able to expose port 402 for the acmetool webserver to verify the domains

Ah, this might be the problem. This port needs to be exposed to the public internet? My rancher server is behind a NAT, and only 80 and 443 are tunneled through.

Munsio commented 7 years ago

About the NAT - that shouldn't be a problem with exposed 402 port i only mean that there are no conflicts with other services inside the rancher environment - the nginx-config works as an proxy for le-auth so 80 and 443 are fine.

Munsio commented 7 years ago

@Ramblurr - hey there any news on this topic?

Ramblurr commented 7 years ago

I did a completely clean reinstall, waited until the rate limit ban was over, and it seems to be working now. But it just fetched news keys.

It hasn't attempted to renew yet though, which was what the problem was originally. Is there a way to force a renew to test if it works?

Munsio commented 7 years ago

@Ramblurr please check your nginx.tmpl if it is the same with the one from the dev branch we added an additional well-known directive under the ssl-server part.