golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.15k stars 17.69k forks source link

proposal: x/crypto/acme/autocert: Manager should support DNS-01 verification #23198

Open winteraz opened 6 years ago

winteraz commented 6 years ago

What did you do?

I've tried to setup autocert behind a firewall.

What did you expect to see?

https working flawlessly (using letsencrypt infrastructure)

What did you see instead?

Verification failed due the firewall.

I believe dns-01 should be built into Manager. It could have a function (i.e. SetTXT) field which if mutated is used by the Manager to set the TXT records required for the DNS verification.

bradfitz commented 6 years ago

Who terminates your TLS? What's your setup look like?

We're really trying to keep the autocert package simple and move any complexity into the acme package. Generally our answer for people with non-standard setups is to have them use the acme package directly, rather than try to make the autocert package be all things to all people.

If we did support dns-01, I'd rather it be automatic. What do your firewall allow in & out?

bradfitz commented 6 years ago

/cc @x1ddos

winteraz commented 6 years ago

A function in Manager(i.e. type DNSUpdater func(k, v)error // updates DNS TXT records with key/name k and value k should suffice and allow the setup to be fully automatic.

Depending on the DNS provider the client/developer will provide the appropriate DNSUpdater function. The changes would be minimal and I believe all of them should be in the verify method. The dns-01 verification seems already in developed in the acme package. https://github.com/golang/crypto/blob/master/acme/autocert/autocert.go#L491 I don't think it will make it significantly more complex than it already is.

bradfitz commented 6 years ago

@winteraz, I'm not asking about solutions. We try to describe & understand the problem before jumping to solutions. Could you answer my questions above?

winteraz commented 6 years ago

My server has all incoming traffic restricted to a limited set of IPs so letsencrypt can't access it for the tls-sni challenge verification. The outgoing traffic has no restriction. I used to allow incoming traffic temporarily(as long as it took to set-up the SSL) but this requires manual intervention and defeats the purpose of ACME.

winteraz commented 6 years ago

This may not be the most popular setup but firewalls are not that uncommon. The main issue is that Letsencrypt doesn't advertise their IP addresses and they actually forbid whitelist practices. A simple google "Letsencrypt firewall" may prove the issue is quite common and their response is to use dns-01.

winteraz commented 6 years ago

It's also worth to note that the upcoming wildcard certificates will be available using only dns-01

We intend to support wildcard certificates in January 2018 as part of the ACMEv2 endpoint. Wildcard issuance will require base domain validation using DNS-01 challenges.

https://letsencrypt.org/docs/faq/

x1ddos commented 6 years ago

The changes would be minimal and I believe all of them should be in the verify method. The dns-01 verification seems already in developed in the acme package. https://github.com/golang/crypto/blob/master/acme/autocert/autocert.go#L491 I don't think it will make it significantly more complex than it already is.

I actually doubt the changes would be minimal. The tls-sni verification is almost instant, whereas anything DNS related may take hours. It's quite a different flow, i.e. out-of-band.

winteraz commented 6 years ago

@x1ddos below are the changes required. I've been using the fork for several days. I still stand by my statement that the changes are minimal. I'm using route53 and the DNS verification is almost instant(i.e. takes few seconds). https://github.com/golang/crypto/compare/master...winteraz:master

x1ddos commented 6 years ago

Given the "tls-sni-xx" is probably gone for good, "http-01" in #21890 will become the default. Maybe we should reconsider and also add "dns-01" in case something happens to "http-01" too.

rusenask commented 6 years ago

Hello, I have been looking into this as well. I agree with @x1ddos that DNS challenge might be too slow for some (might be the most) of the cases, but this recent issue with tls-sni demonstrated that we need to have multiple options as a lot of applications are now helpless.

mpx commented 6 years ago

I think it's a little early to say tls-sni is "probably gone for good", it's going to take a while before anyone knows how this is going to work out.

There are active discussions looking at how tls-sni might be fixed. Given the benefits of using SNI, I suspect it's more likely there will be a replacement -- how long it takes is another question.

x1ddos commented 6 years ago

I think it's a little early to say tls-sni is "probably gone for good"

From https://community.letsencrypt.org/t/2018-01-11-update-regarding-acme-tls-sni-and-shared-hosting-infrastructure/50188:

The ACME TLS-SNI-01 validation method will remain disabled permanently for new accounts by default. Since the same problems apply to TLS-SNI-02, TLS-SNI-02 will remain disabled in our upcoming ACMEv2 API endpoint.

Mitigations for Existing TLS-SNI Users

Our recommendation for users is to begin a migration to the HTTP-01 or DNS-01 validation methods. We are working to provide a reasonable amount of migration time for as many users as possible, while maintaining our commitment to security.

billinghamj commented 6 years ago

Compatibility will need to be maintained for some period of time, in order to allow for renewals etc., but yeah clearly is it is permanently deprecated from the perspective of new users.

mpx commented 6 years ago

The bottom of that announcement also says:

ACME Protocol Updates

We will engage with the IETF ACME working group to decide the future of TLS-SNI validation and remediations to the discovered problems.

tls-sni-01/02 won't be back. The announcement and the discussions I linked indicate that people haven't given up on using SNI yet. It's too early to say how it will turn out.

mdempsky commented 6 years ago

Who terminates your TLS? What's your setup look like?

I have a server on my home network for controlling my lights. Currently it's a simple web form that runs over HTTP with a bare IP address. I'm interested in changing it into a Progressive Web App, which requires serving over HTTPS, hence looking at acme/autocert to facilitate handling fetching/renewing TLS certificates.

The server doesn't have a public IP address, so it's not trivial to arrange for it to handle HTTP/HTTPS requests itself. However, it is relatively easy for me to arrange the server to have authorization to modify my Route 53 DNS records.

It seems like if I could provide my own challenge responder logic, that would be the easiest way to reuse the rest of autocert's logic. Open to alternative suggestions though.

keegancsmith commented 6 years ago

@mdempsky another way to solve your issue with less work on your end is to use Caddy as a TLS terminating proxy. It supports challenges via Route 53 DNS.

mdempsky commented 6 years ago

@keegancsmith Thanks for the tip about Caddy. That does seem like a better solution for my use case. I'll look into it.

immesys commented 6 years ago

The learning curve on using the acme package for DNS challenges is pretty high compared to autocert. (e.g there are no examples in the docs). It would be nice to either make it more obvious how to use the acme package for DNS challenges, or make autocert support DNS challenges. I am not really concerned about the time it takes.

FYI my use case is a service on kubernetes that provides a GRPC API, but not on port 443. I can't listen on port 80 or 443 as there are other services doing that.

bluecmd commented 6 years ago

+1. I'm writing a BMC firmware to run on low-powered ARM CPUs that will most likely not have publicly addressable IP addresses. Doing DNS-01 for these devices is what I'm going to implement with or without autocert. I'd be pleased if I don't have to implement the plumbing myself, and I'd rather use the nice Manager interface of autocert.

x1ddos commented 6 years ago

Autocert supports both http-01 and tls-alpn challenges. So, that's already more than 1.

It's unclear how to handle dns-01 at the moment. It is a very different flow. The way autocert works is it requests issuance of a new cert during the first inflight request. As you all know, DNS propagation may take hours for a CA server to see, unlike HTTP requests for http-01 and tls-alpn challenges where hostname resolution is expected to be within milliseconds.

We could of course do something like what's proposed in https://github.com/winteraz/crypto/commit/b97c10626b57d2017e72234d4be589e3f5f714a5, adding a clean up function, but it needs implementation for various DNS severs/providers. Maybe hypothetical x/crypto/acme/autocert/dns/{gcp,aws,do,etc} packages could provide some initial implementations.

I'm afraid people will start enabling dns-01 and expecting it to work as fast as the other challenges, which it most likely won't. Maybe it works today specifically with Let's Encrypt but that's just their particular implementation.

For the time being, an alternative could be for one to run a separate process, renewing the certs say in recurring cron job, and let devices use them. Here's an example for dns-01 with lower level acme.Client:

package main

import (
    "context"
    "crypto/ecdsa"
    "crypto/elliptic"
    "crypto/rand"
    "crypto/x509"
    "log"
    "os"
    "time"

    "golang.org/x/crypto/acme"
)

func main() {
    ctx := context.Background()
    client := acmeClient(ctx)

    // Authorize all domains provided in the cmd line args.
    for _, domain := range os.Args[1:] {
        authz, err := client.Authorize(ctx, domain)
        if err != nil {
            log.Fatal(err)
        }
        if authz.Status == acme.StatusValid {
            // Already authorized.
            continue
        }

        // Pick the DNS challenge, if any.
        var chal *acme.Challenge
        for _, c := range authz.Challenges {
            if c.Type == "dns-01" {
                chal = c
                break
            }
        }
        if chal == nil {
            log.Fatalf("no dns-01 challenge for %q", domain)
        }

        // Fulfill the challenge.
        val, err := client.DNS01ChallengeRecord(chal.Token)
        if err != nil {
            log.Fatalf("dns-01 token for %q: %v", domain, err)
        }
        // TODO: Implement. This depends on your DNS hosting.
        // The function must provision a TXT record containing
        // the val value under "_acme-challenge" name.
        if err := updateMyDNS(ctx, domain, val); err != nil {
            log.Fatalf("DNS update for %q: %v", domain, err)
        }
        // Let CA know we're ready. But are we? Is DNS propagated yet?
        if _, err := client.Accept(ctx, chal); err != nil {
            log.Fatalf("dns-01 accept for %q: %v", domain, err)
        }
        // Wait for the CA to validate.
        if _, err := client.WaitAuthorization(ctx, authz.URL); err != nil {
            log.Fatalf("authorization for %q failed: %v", domain, err)
        }
    }

    // All authorizations are granted. Request the certificate.
    key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
    if err != nil {
        log.Fatal(err)
    }
    req := &x509.CertificateRequest{
        DNSNames: os.Args[1:],
    }
    csr, err := x509.CreateCertificateRequest(rand.Reader, req, key)
    if err != nil {
        log.Fatal(err)
    }
    crt, _, err := client.CreateCert(ctx, csr, 90*24*time.Hour, true /* inc. chain */)
    if err != nil {
        log.Fatal(err)
    }

    // TODO: Store cert key and crt ether as is, in DER format, or convert to PEM.
}

func newClient(ctx context.Context) *acme.Client {
    akey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
    if err != nil {
        log.Fatal(err)
    }
    client := &acme.Client{Key: akey}
    if _, err := client.Register(ctx, &acme.Account{}, acme.AcceptTOS); err != nil {
        log.Fatal(err)
    }
    return client
}
bluecmd commented 6 years ago

Thanks @x1ddos for your input! I'm curious, what do you base the slowness of DNS on? It's certainly true that a lot of DNS providers are slow, but that's not inherent in the system.

In my scenario the domain will be owned by the server in question, and it will use a very low TTL time. Likewise the SOA of the domain in question will have a low negative cache TTL.

I agree that this will not be a very common setup, but I don't agree saying that DNS will always be slow.

Anecdotal evidence on another DNS-01 setup: I'm using cert-manager and DNS validation already for some other things, and it takes about 30 seconds to do the DNS validation and get the certificate. I could probably push that lower if I wanted to as well.

x1ddos commented 6 years ago

Ah well, your setup with very low TTL time is not that common indeed. :) The very first requests will be hanging there until the cert is issued. Say, you have 1 QPS. With cert issuance taking 30sec there will be 30 requests waiting for the TLS handshake by the time it's ready, unless they timeout earlier.

Another bit I'm thinking about is the challenge order. At the moment, autocert will try tls-alpn-01 then http-01 if enabled and the tls-alpn fails. Suppose we add dns-01. Should it be preferred over the others, exclusive, or maybe we'll also need a way for autocert package users to indicate the order in which challenges are to be selected, if at all.

Just thinking out loud. Ideas are very welcome.

rusenask commented 6 years ago

I had a similar desire and got DNS with Cloudflare as a provider working with autocert's GetCertificate() function. You can have a look at this https://github.com/rsc/letsencrypt/blob/master/lets.go great example for inspiration. It's already using correct library (github.com/xenolf/lego/acme) which supports tons of providers. However nicely it fits together, I don't believe it should be merged with x/crypto/acme/autocert :)

Regarding DNS performance: it seems very fast but subsequent requests for different subdomains will likely fail (at least they always fail for me due to a slower cleanup). It's great for wildcards though.

bradfitz commented 6 years ago

I think it's fine to keep autocert small and opinionated, focused on just TLS-ALPN. That it supports http-01 is really just a historical accident.

rgooch commented 4 years ago

I recently tried out autocert, but it turns out to not be viable for us as we run an internal service. If we had split-horizon DNS then that would be be one path to viability, but migrating to that would be a long, risky process. It's not on the horizon [sic].

Using public IP addresses for our services with security groups to allow access only from company networks is also challenging, as it is difficult to identify all the public NAT addresses that are elastically assigned to our VPCs. I don't want to deploy a solution that is likely to generate an ongoing trickle of support tickets to open up access (leaving aside the limitations on the size of security groups).

It's been 1.5 years since the last comment. Has any thinking changed on the scope of autocert since then? If nothing is likely to change, then sadly I'll implement a certificate manager which is pluggable, giving users the choice of which authentication method they want to use and the ability to easily add their own. For me, this also ties into other limitations with autocert around safely performing the ACME transaction concurrently across different instances of a web service. See issue #36818 for more information.

rgooch commented 4 years ago

So, since this doesn't seem to be going forward, I've written a certificate manager. It supports the dns-01 and http-01 challenge types. I've written plugins for the http-01 challenge and the dns-01 challenge with AWS Route 53. It wouldn't be hard for someone to write a dns-01 challenge responder for another DNS service.

Not yet implemented are the plugins for distributing certs+keys and ACME transaction locking, but since the code adds a random jitter for ACME attempts, you can probably get away with running multiple instances with the code as-is. I've already deployed this since it's a huuuge improvement over what we had (no automation, <60 days to go before certificates start expiring). I plan on using AWS Secrets Manager for both distributing certs+keys and for transaction locking when I write a plugin. I may also implement a plugin using etcd for this. This would provide a vendor-neutral solution for those who are willing to set up etcd.

A preview of the code is available here: https://github.com/rgooch/golib/tree/certmon-preview/pkg/crypto/certmanager

@4n3w: I gather from your thumbs-up that you may be interested in this?

torrentkino commented 4 years ago

Autocert supports both http-01 and tls-alpn challenges. So, that's already more than 1.

It's unclear how to handle dns-01 at the moment. It is a very different flow. The way autocert works is it requests issuance of a new cert during the first inflight request. As you all know, DNS propagation may take hours for a CA server to see, unlike HTTP requests for http-01 and tls-alpn challenges where hostname resolution is expected to be within milliseconds.

We could of course do something like what's proposed in winteraz/crypto@b97c106, adding a clean up function, but it needs implementation for various DNS severs/providers. Maybe hypothetical x/crypto/acme/autocert/dns/{gcp,aws,do,etc} packages could provide some initial implementations.

I'm afraid people will start enabling dns-01 and expecting it to work as fast as the other challenges, which it most likely won't. Maybe it works today specifically with Let's Encrypt but that's just their particular implementation.

For the time being, an alternative could be for one to run a separate process, renewing the certs say in recurring cron job, and let devices use them. Here's an example for dns-01 with lower level acme.Client:

package main

import (
  "context"
  "crypto/ecdsa"
  "crypto/elliptic"
  "crypto/rand"
  "crypto/x509"
  "log"
  "os"
  "time"

  "golang.org/x/crypto/acme"
)

func main() {
  ctx := context.Background()
  client := acmeClient(ctx)

  // Authorize all domains provided in the cmd line args.
  for _, domain := range os.Args[1:] {
      authz, err := client.Authorize(ctx, domain)
      if err != nil {
          log.Fatal(err)
      }
      if authz.Status == acme.StatusValid {
          // Already authorized.
          continue
      }

      // Pick the DNS challenge, if any.
      var chal *acme.Challenge
      for _, c := range authz.Challenges {
          if c.Type == "dns-01" {
              chal = c
              break
          }
      }
      if chal == nil {
          log.Fatalf("no dns-01 challenge for %q", domain)
      }

      // Fulfill the challenge.
      val, err := client.DNS01ChallengeRecord(chal.Token)
      if err != nil {
          log.Fatalf("dns-01 token for %q: %v", domain, err)
      }
      // TODO: Implement. This depends on your DNS hosting.
      // The function must provision a TXT record containing
      // the val value under "_acme-challenge" name.
      if err := updateMyDNS(ctx, domain, val); err != nil {
          log.Fatalf("DNS update for %q: %v", domain, err)
      }
      // Let CA know we're ready. But are we? Is DNS propagated yet?
      if _, err := client.Accept(ctx, chal); err != nil {
          log.Fatalf("dns-01 accept for %q: %v", domain, err)
      }
      // Wait for the CA to validate.
      if _, err := client.WaitAuthorization(ctx, authz.URL); err != nil {
          log.Fatalf("authorization for %q failed: %v", domain, err)
      }
  }

  // All authorizations are granted. Request the certificate.
  key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
  if err != nil {
      log.Fatal(err)
  }
  req := &x509.CertificateRequest{
      DNSNames: os.Args[1:],
  }
  csr, err := x509.CreateCertificateRequest(rand.Reader, req, key)
  if err != nil {
      log.Fatal(err)
  }
  crt, _, err := client.CreateCert(ctx, csr, 90*24*time.Hour, true /* inc. chain */)
  if err != nil {
      log.Fatal(err)
  }

  // TODO: Store cert key and crt ether as is, in DER format, or convert to PEM.
}

func newClient(ctx context.Context) *acme.Client {
  akey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
  if err != nil {
      log.Fatal(err)
  }
  client := &acme.Client{Key: akey}
  if _, err := client.Register(ctx, &acme.Account{}, acme.AcceptTOS); err != nil {
      log.Fatal(err)
  }
  return client
}

Hello,

I am actually using this. Is there an ACMEv2 example for this specific case available?

Kind regards Aiko

rgooch commented 4 years ago

Regarding DNS propagation: while in theory it can take hours for records to propagate, it often takes less than a minute. For example, AWS Route 53 has a 1 minute SLA. If you create TXT records with a sub-minute TTL, it works pretty well. The approach seems to be: if it doesn't work for everyone, we won't give it to anyone. That's not how I approach things. This is one of the reasons I decided to write my own certificate manager.

Since autocert performs the ACME transaction at the start of the TLS connection, the overall experience is more vulnerable to delays in obtaining the certificate. The code I wrote starts the renewal process in a goroutine as soon as the programme starts, so it tends not to suffer from latencies. While I could have written a cron job to do this, it's simpler and more robust to build this into the code. For a cron job, one could use this: https://github.com/acmesh-official/acme.sh

The code I wrote (only) supports ACME v2: https://github.com/rgooch/golib/tree/certmon-preview/pkg/crypto/certmanager

torrentkino commented 4 years ago

Hey,

I extracted the DNS-01 part from here: https://github.com/golang/crypto/blob/master/acme/internal/acmeprobe/prober.go

And it looks like the POC worked with my very first try. Wow, because things became more complicated.

I wrote a broker, that enables internal servers to interact with our PowerDNS-Servers. Each server gets a token, that is associated with one fqhn. No server interacts with the PowerDNS-API directly for security reasons. And I also make sure, that all DNS slaves are in sync before starting the handshake with Let's encrypt. It worked smoothly for the last two years.

Bye Aiko

rgooch commented 4 years ago

For whoever is interested, the certmanager package I wrote is checked in, including the more advanced features of ACME transaction locking and certificate+key distribution using AWS Secrets Manager. Other Locker and Storer backends are welcome. Both dns-01 and http-01 challenges are supported. We're running small clusters of servers in Production and are quite happy with it. Code: https://github.com/Cloud-Foundations/golib/tree/master/pkg/crypto/certmanager API GoDoc: https://godoc.org/github.com/Cloud-Foundations/golib/pkg/crypto/certmanager

gopherbot commented 2 years ago

Change https://golang.org/cl/381994 mentions this issue: acme/autocert: add support for dns-01 challenges

mcrute commented 2 years ago

I realize this issue is rather old but I could really use dns-01 support. The majority of services that I run aren't accessible on the public internet but do have the ability to fulfill DNS challenges. I started a CL (linked above) with my proposal for adding this support to autocert. I've tested it on some real services and it works pretty well.

The CL adds a DNSManager interface and optional field on the Manager that, if present, will request DNS challenges from the CA and call the manager to do whatever is necessary to configure DNS for the response. Given the many different DNS services I don't think there's a good one-size-fits-all approach for this so anyone who wants to support DNS challenges are going to have to write a little bit of glue to make that work. Still, I think there should be plenty of context to do that with this interface.

I'd love to see this merged and would be happy to tweak the approach if that's needed.

Update: I see that DNS propagation delay is a major topic in the comments above. I have seen in practice over the past ~5 years (using a similar but not as well built client as autocert) that propagation times are generally in the 10-30 second range with occasional spikes up to 2 minutes. I have not seen anything that would exceed the 5 minute timeout in autocert, even accounting for other work that must be done. Although the first request to a service using autocert will be rather slow under worst case propagation delay and a cold cache. I mitigate this by making a pre-flight request to my app before putting it into service to prime the cert cache.

My setup today uses an internal service that can authenticate callers and make the DNS update then watches a few public resolvers until it sees the records propagate before returning to the client. The various clients of this service are themselves only reachable on the internal network.

sr commented 2 years ago

FWIW, I am using @mcrute's fork successfully (thanks!) with DNSimple as my DNS provider. I changed the signature slightly :

type DNS01ChallengeSolver func(ctx context.Context, domain string, record string) (cleaner func() error, err error)

type Manager struct {
    // DNS01 is used to respond to dns-01 challenges returned from the CA.
    // If this field is nil then DNS challenges will not be requested from the
    // CA.
    DNS01 DNS01ChallengeSolver
}

Propagation hasn't been a problem for me either; my solver blocks until net.LookupTXT returns the expected record.

mcrute commented 2 years ago

Thanks for the ping @sr; I've totally forgotten to update this issue. I've been running the exact patch as in the CL above in production for the past ~11 months for O(10s) of different services/sites and have found it to be both stable and reliable. So far I've found no use-case that requires revision.

@bradfitz are you the right person to comment/merge this or should someone else take a look? I would really like to officially land this if possible. Thanks!

Ping @rolandshoemaker and @FiloSottile since you're tagged on the CL as well. Do you see any revisions or additional consensus that needs to occur to merge this?

mcrute commented 1 year ago

Well it's been almost a year and no activity on getting this patch merged despite nudges to the maintainers. This patch works well, I've used it for about a year now, but I don't want to carry it in a fork indefinitely. I'm migrating away from autocert to certmagic instead. It does everything I want and extends nicely.

anacrolix commented 1 year ago

I've been using this patch with success too. @mcrute your certmagic link is broken, guessing you meant https://github.com/caddyserver/certmagic.

mcrute commented 1 year ago

Given that this is a working patch that I'm currently using in production systems; I'm still willing to re-open the patch and try to get it merged if there is any interest at all from the upstream maintainers. That being said there does not seem to be any interest in merging this so I'm going to go elsewhere and this patch will eventually bit-rot.

rolandshoemaker commented 1 year ago

This would introduce significant new functionality, and require an API change, and as such should go through the proposal process.

Like @bradfitz says, I think we'd like to keep autocert a very simple package, which is why it is generally designed around the TLS-ALPN challenge. Without a very strong use case, I'm not particularly convinced we should increase the complexity of the package to get this support. As @bradfitz said, and evidenced by some of the alternative solutions proposed in this thread, it is possible to do this with the acme package currently, albeit it is more complex to implement.

The DNS-01 challenge introduces a number of complex questions around validation, TTL being one of them. At my previous job (Let's Encrypt), we saw propagation issues as the number one issue when diagnosing DNS challenge failures, so I suspect the support burden to support for the general population for this feature would be higher than expected (while most bigger DNS providers are relatively okay at this, there is a long tail where that is not necessarily the case, especially since at least LE uses multi-viewpoint validation, which is likely to exacerbate propagation issues).

mcrute commented 1 year ago

@rolandshoemaker the TTL issue is in-fact the biggest issue with DNS challenges, in my experience. We've patched over it by having an ACME sub-service in our DNS service that waits for propagation to a few major public servers before returning to the client but I can see how this approach doesn't scale to providers.

I read your answer to this question as "no, we will not support this", which is fine, but in that case will you please close this issue to prevent people like myself from wasting energy in trying to support something that will not be supported?