caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.29k stars 4k forks source link

Caddy should not repeatedly ask for a certificate when the challenge will fail #4682

Closed mildred closed 2 years ago

mildred commented 2 years ago

Before asking a certificate on a domain, Caddy should check that the DNS name is configured and that the IP configured on the DNS points to Caddy itself. This should probably be toggleable in configuration in case the DNS settings or reverse proxies do not allow this kind of check to succeed.

This is needed in order to avoid asking too much certificates and triggering rate limits.

Alternately, Caddy should correctly handle failures to issue a certificate because of domain name configuration issues and should blacklist the domain for a given time to avoid triggering rate limits.

Context:

I'm getting context cancelled errors while Caddy is solving challenges. This causes the challenge to be aborted and later on the ACME server responds with too many failed authorizations recently: see https://letsencrypt.org/docs/rate-limits/. After investigation, this is caused by Caddy rying to obtain a certificate for a domain that is not associated with Caddy.

Use case: the Caddy server is configured dynamically with many domains that are not controlled by the system and that Caddy should respond to if connected with the given hostname.

Caddy version: v2.4.6 h1:HGkGICFGvyrodcqOOclHKfvJC0qTU7vny/7FhYp9hNw=

Relevant logs
Apr 06 08:52:57 os-gravelines-1.webmsg.me caddy[786462]: {"level":"info","ts":1649235177.843804,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"dav.test.webmsg.me","challenge_type":"http-01","ca":"https://acme.zerossl.com/v2/DV90"}
Apr 06 08:53:35 os-gravelines-1.webmsg.me caddy[786462]: {"level":"warn","ts":1649235215.0471594,"logger":"tls.issuance.acme.acme_client","msg":"HTTP request failed; retrying","url":"https://acme.zerossl.com/v2/DV90/authz/rlDc2xEpCBwppETqYTif6Q","error":"performing request: Post \"https://acme.zerossl.com/v2/DV90/authz/rlDc2xEpCBwppETqYTif6Q\": context canceled"}
Apr 06 08:53:35 os-gravelines-1.webmsg.me caddy[786462]: {"level":"error","ts":1649235215.0474002,"logger":"tls.issuance.acme.acme_client","msg":"deactivating authorization","identifier":"dav.test.webmsg.me","authz":"https://acme.zerossl.com/v2/DV90/authz/rlDc2xEpCBwppETqYTif6Q","error":"attempt 1: https://acme.zerossl.com/v2/DV90/authz/rlDc2xEpCBwppETqYTif6Q: context canceled"}
Apr 06 08:53:35 os-gravelines-1.webmsg.me caddy[786462]: {"level":"error","ts":1649235215.0475502,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"dav.test.webmsg.me","issuer":"acme.zerossl.com-v2-DV90","error":"[dav.test.webmsg.me] solving challenges: [dav.test.webmsg.me] context canceled (order=https://acme.zerossl.com/v2/DV90/order/uPVy9eGBGg1qwQM69_MTNA) (ca=https://acme.zerossl.com/v2/DV90)"}
Apr 06 08:53:35 os-gravelines-1.webmsg.me caddy[786462]: {"level":"error","ts":1649235215.047961,"logger":"tls","msg":"job failed","error":"dav.test.webmsg.me: obtaining certificate: [dav.test.webmsg.me] Obtain: [dav.test.webmsg.me] solving challenges: [dav.test.webmsg.me] context canceled (order=https://acme.zerossl.com/v2/DV90/order/uPVy9eGBGg1qwQM69_MTNA) (ca=https://acme.zerossl.com/v2/DV90)"}
The domain dav.test.webmsg.me did not exist at the time of this trace.

Implementation idea:

Before asking a certificate for a domain example.org, Caddy should:

jotoho commented 2 years ago

Just another caddy user here but if I remember correctly, it is possible to set more than one A/AAAA record per hostname inside DNS - in which case the client decides which server to connect to.

i think that in a setup with multiple IPs serving a DNS hostname, your implementation idea might need adjusting because simply connecting to http://example.org might mean (sometimes) connecting to another (caddy) server, even if the requesting caddy instance is also listed inside the A/AAAA records.

francislavoie commented 2 years ago

That wouldn't work. Caddy isn't always able to connect to itself via public DNS. Many routers in front of Caddy (e.g. consumer routers) don't support hairpin-NAT (where the router detects that the packet destination is its own WAN IP), so packets would just get dropped.

I'm not sure I understand your usecase here. How are you configuring Caddy, exactly? Caddy already is very careful about rate limit avoidance: https://caddyserver.com/docs/automatic-https#errors

mholt commented 2 years ago

Caddy should check that the DNS name is configured and that the IP configured on the DNS points to Caddy itself.

Yeah, this is something that has been discussed and rehashed a lot before. The answer basically is no. Pre-validation checks are generally considered against best practices: https://github.com/https-dev/docs/blob/master/acme-ops.md#do-not-rely-on-pre-validation-checks

Do not rely on pre-validation checks.

Pre-validation checks are automated checks that attempt to determine if an ACME challenge will succeed or fail before proceeding with the actual challenge.

In practice, these do not work very well for public domains.

It can be difficult to know whether an ACME challenge will succeed. Successful validations require one or more external lookups/connections on infrastructure that depends on the machine's perspective. For example, DNS lookups from the ACME client often result in different records than what the ACME server will see. External connections are difficult or impossible to reliably test internally.

It is not advisable to use a public CA's staging endpoint as a "pre-check" for all certificate issuances, as this would add considerable load at a global scale. However, it is not a bad idea to stand up one's own external ACME server for pre-checks, with the understanding that its result may not match the real ACME server's validation.

So far, pre-validation checks are often inaccurate and seldom worth the effort; however, it is possible that better techniques may arise in the future.

In the meantime, the best way to know whether a challenge will succeed is simply to try it. Before doing so, where possible, a human administrator should ensure that DNS and firewalls are properly configured. (Production ACME endpoints are not to be used for debugging or troubleshooting.)

The whole point of the ACME protocol is to check whether everything checks out, and the CA's vantage point is all that matters. We know from experience that pre-validation checks are harmful/useless and only get in the way.

Ultimately, it's impossible to know whether the ACME challenge will succeed until we actually do the challenge.

As for your alternate suggestion; as Francis said, Caddy already has world-class error handling of ACME challenge failures.

Heraes-git commented 8 months ago

@mholt Correct me if I'm wrong :

  1. Caddy asserts that it is capable to deliver automatically HTTPS certificate without us doing all the processes.
  2. Obtaining a HTTPS certificate consist in asking a "guy" (a certified public domain possessed by someone/a company) if he agrees to vouch for us as "a good person" (a good... domain ?).
  3. All certified domains of the public certified TLD aren't usable with Caddy. Just some are associated with it. (If that wasn't the case, we wouldn't have errors while trying to start Caddy with automatic HTTPS for our local domains !)
  4. We don't know the ones that are.
  5. Caddy will fail if we don't choose a TLD associated with Caddy (i.e : accepting Caddy requests of certificates).
  6. But you refuse to implement a friendly-user algorithm to prepare our attempts to succeed, by warning us when we choose a "wrong" TLD name.
  7. And you base that on "the whole point of the process is to check if you're right or wrong", and stir your spoon in your Star-Wars mug while laughing.
  8. But the REAL whole point of the certification is to ask a "guy" who doesn't even know us, to accept us (except if we take into account the 3rd point, implying that Caddy has been previously associated/verified upstream).
  9. Finally, we end up searching hours and hours the TLD names we can use without having a fucking error message, and nothing is absolutely automatic.

To me, this is absurd logic. You try to defend a pseudo-certification while you're trying to bypass it. Just stick to the magick trick consisting in asking to a "valid" trusted intermediary to deliver the god damn certificate, and help us figuring the ones that work, instead of bragging with "best practices". Here, the word "valid" means "the few guys that accept Caddy".

NB : by "magick trick", I mean that asking someone else to be our CA's vantage point instead of being ourselve declared as it, is a kind of a trick. The correct way for Caddy would be to ask to the public key structure to be registered as an entity.

We know from experience that pre-validation checks are harmful/useless and only get in the way.

No, we can absolutely say "go here to obtain a valid certificate", and that's not LESS secured than saying "go anywhere in the world and find by yourself a building where to obtain your certificate". Moreover, we save time.

Thanks. -_-

Heraes-git commented 8 months ago

@francislavoie Many routers in front of Caddy (e.g. consumer routers) don't support hairpin-NAT True. But nothing stops you to implement a solution in Caddy itself to detect the aforementioned WAN IP, by for instance allowing us to declare it in a config file. If you want, you can. That's what we do when we want to develop friendly-user algorithms to avoid users to debug our software for us.

mholt commented 8 months ago

We used to do pre-validation checks back in the "lego" days and it broke many deployments.

I'm not really sure what your actual underlying issue is but if you want to be non-confrontational about it you're welcome to open a topic on our forum and fill out the help template.