caddy-dns / namecheap

39 stars 4 forks source link

ACME started failing #5

Open 0xjams opened 2 years ago

0xjams commented 2 years ago

OS: Ubuntu 20.04.3 LTS Caddy Version: 2.4.6

Dockerfile:

FROM caddy:builder AS builder
WORKDIR .
RUN  xcaddy build --with github.com/caddy-dns/namecheap
FROM caddy:latest
COPY --from=builder /usr/bin/caddy /usr/bin/caddy

Caddyfile header

The staging and production urls were tested yielding the same results

{
email xxx@xxxx

acme_ca https://acme-staging-v02.api.letsencrypt.org/directory

}

Caddyfile (important part):

ntopng.cdv.jmoran.me {
    tls {
            #issuer acme {
            #dns lego_deprecated namecheap
            #}
        dns namecheap {
        api_key {env.NAMECHEAP_API_KEY}
        user {env.NAMECHEAP_API_USER}
        }
    }
    header / {

     Strict-Transport-Security "max-age=31536000; includeSubdomains"
     X-XSS-Protection "1; mode=block"
     X-Content-Type-Options "nosniff"
     X-Frame-Options "SAMEORIGIN"
     Referrer-Policy "no-referrer-when-downgrade"
    # Content-Security-Policy "default-src self http: https: data: blog: 'unsafe-inline'"
     -Server
    }
    reverse_proxy {
        to https://10.10.10.1:3000
        header_up Host {upstream_hostport}
            header_up X-Forwarded-Host {host}
        transport http {
        tls
        tls_insecure_skip_verify
        }
    }
}

Error that can be seen in docker logs:

caddy2              | {"level":"info","ts":1646273804.629268,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"ntopng.cdv.jmoran.me","challenge_type":"dns-01","ca":"https://acme.zerossl.com/v2/DV90"}
caddy2              | {"level":"error","ts":1646273805.732329,"logger":"tls.issuance.acme.acme_client","msg":"cleaning up solver","identifier":"ntopng.cdv.jmoran.me","challenge_type":"dns-01","error":"no memory of presenting a DNS record for ntopng.cdv.jmoran.me (probably OK if presenting failed)"}
caddy2              | {"level":"error","ts":1646273808.1837244,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"ntopng.cdv.jmoran.me","issuer":"acme.zerossl.com-v2-DV90","error":"[ntopng.cdv.jmoran.me] solving challenges: presenting for challenge: adding temporary record for zone jmoran.me.: expected element type <ApiResponse> but have <html> (order=https://acme.zerossl.com/v2/DV90/order/1XsBWDMZWGr8ULYJaUsQAw) (ca=https://acme.zerossl.com/v2/DV90)"}
caddy2              | {"level":"error","ts":1646273808.183788,"logger":"tls.obtain","msg":"will retry","error":"[ntopng.cdv.jmoran.me] Obtain: [ntopng.cdv.jmoran.me] solving challenges: presenting for challenge: adding temporary record for zone jmoran.me.: expected element type <ApiResponse> but have <html> (order=https://acme.zerossl.com/v2/DV90/order/1XsBWDMZWGr8ULYJaUsQAw) (ca=https://acme.zerossl.com/v2/DV90)","attempt":1,"retrying_in":60,"elapsed":23.939634318,"max_duration":2592000}

I verified that my Namecheap credentials were right, in fact two weeks ago this configuration was working perfectly with another subdomain, I started having this issue today.

Any ideas?

nrfox commented 2 years ago

I'm not sure why it would all of a sudden stop working but it looks the namecheap api returned some html instead of xml. Maybe something is up with the url for the api endpoint. Can you try setting the endpoint explicitly to either https://api.namecheap.com/xml.response or https://api.sandbox.namecheap.com/xml.response if you are testing in staging first?

...
        dns namecheap {
          ...
          endpoint "https://api.sandbox.namecheap.com/xml.response"
        }
0xjams commented 2 years ago

Hi:

Thanks for your answer:

ntopng.casa.jmoran.me {
    tls {
            #issuer acme {
            #dns lego_deprecated namecheap
            #}
        dns namecheap {
        api_key {env.NAMECHEAP_API_KEY}
        user {env.NAMECHEAP_API_USER}
        endpoint "https://api.namecheap.com/xml.response"
        }
    }
    header / {

     Strict-Transport-Security "max-age=31536000; includeSubdomains"
     X-XSS-Protection "1; mode=block"
     X-Content-Type-Options "nosniff"
     X-Frame-Options "SAMEORIGIN"
     Referrer-Policy "no-referrer-when-downgrade"
    # Content-Security-Policy "default-src self http: https: data: blog: 'unsafe-inline'"
     -Server
    }
    reverse_proxy {
        to https://10.0.100.1:3000
        header_up Host {upstream_hostport}
            header_up X-Forwarded-Host {host}
        transport http {
        tls
        tls_insecure_skip_verify
        }
    }
}

I tested with this configuration:

Got the same message:

caddy2 | {"level":"error","ts":1646409695.540807,"logger":"tls.obtain","msg":"will retry","error":"[ntopng.casa.jmoran.me] Obtain: [ntopng.casa.jmoran.me] solving challenges: presenting for challenge: adding temporary record for zone jmoran.me.: expected element type <ApiResponse> but have <html> (order=https://acme.zerossl.com/v2/DV90/order/SbaAmwan0mBqUZF2uAQBGg) (ca=https://acme.zerossl.com/v2/DV90)","attempt":1,"retrying_in":60,"elapsed":37.5723045,"max_duration":2592000

Is there a flag or anything I could add to the configuration so that I get the html response as part of the log?

pec0ra commented 9 months ago

I started getting the same error as you:

expected element type <ApiResponse> but have <html>

I noticed that I had a lot of _acme-challenge entries in namecheap. I deleted a bunch of them and after that the renewal succeeded.

Maybe namecheap has a limit on how many entries it can have for the same host.

mc962 commented 5 months ago

I started getting the same error as you:

expected element type <ApiResponse> but have <html>

I noticed that I had a lot of _acme-challenge entries in namecheap. I deleted a bunch of them and after that the renewal succeeded.

Maybe namecheap has a limit on how many entries it can have for the same host.

I also had a similar issue recently and came to this thread. I agree that I think it's due to having too many _acme-challenge entries in namecheap. Deleting all of them (there were a lot) fixed things for me.

I was under the impression that these records would be deleted after a period of time by Caddy/this module, but maybe I was mistaken or there is an issue with that functionality if it should be working.

By the way, it seems like the limit is 150 according to Namecheap docs (which I believe, there were a lot there from what I saw).

mholt commented 5 months ago

The respective libdns packages should clean up the TXT records when DeleteRecord is called (Caddy/CertMagic does call this reliably).

mc962 commented 5 months ago

Sometimes I do a lot of restarts of Caddy with systemd somewhat quickly while testing minor configuration changes.

Does the libdns package only clean up the record when it has time to during the same server run/process, and restarting the server "clears" out any previous record of there being something to clean up? Or should it pick up where it left off as part of cleaning itself up (meaning this shouldn't be the issue)?

mholt commented 5 months ago

Does the libdns package only clean up the record when it has time to during the same server run/process, and restarting the server "clears" out any previous record of there being something to clean up? Or should it pick up where it left off as part of cleaning itself up (meaning this shouldn't be the issue)?

CertMagic calls DeleteRecords() when it is done validating the domain (or had an error doing so). This state isn't persisted to storage otherwise, so killing/restarting the process during validation will cause the records to linger.

Or maybe there's a bug in the DNS provider package's DeleteRecords method. :man_shrugging:

PS. Namecheap has new requirements for using their API (you need at least 20 domains in the account, or have some significant account balance or something like that). So if your accounts don't qualify now maybe they are returning an HTML error instead of an XML payload...

mc962 commented 5 months ago

I think it's more likely I've been restarting things too quickly. Maybe if it keeps filling up I'll write a small script to clean things up periodically or something. But it's not a huge concern to me now that I know what the issue is.

I think I must have spent at least $50 in the last 2 years on domains. Or maybe they don't check regularly or haven't run their checks yet, and I'll get kicked off :grimacing: . I guess if that happened I'd just move it to Cloudflare or something and be done with them.

nrfox commented 5 months ago

Or maybe there's a bug in the DNS provider package's DeleteRecords method.

It could be. iirc you get two methods with the namecheap API: getHosts and setHosts. The way that DeleteRecords is implemented in the namecheap libdns package, it first gets the list of of hosts from the namecheap API and then removes all the ones that match the record id for each record passed to DeleteRecords and then calls setHosts with the current hosts - removed hosts.

https://github.com/libdns/namecheap/blob/fc7440785c8e0675a163ff7c0e62c3301539a5a3/internal/namecheap/namecheap.go#L275-L298

So if you had two clients and two records A and B and simultaneously you called:

libdnsClientA.DeleteRecords([A])
libdnsClientB.DeleteRecords([B])

You could end up with either A or B being added back in by the setHosts call that DeleteRecords does. I don't see a way to prevent this given the API limitations but what the libdns namecheap package should do that it doesn't look like it does today is ensure that calls to DeleteRecords for the same client simultaneously results in both getting deleted:

libdnsClient.DeleteRecords([A])
libdnsClient.DeleteRecords([B])

I'm not sure if this will fix the cleanup problem but it wouldn't hurt and it looks like is a requirement of being compliant with the libdns spec anyways:

Each exported method must be safe for concurrent use (i.e. thread-safe) in the sense that there must be no data races. Each provider API may have different atomicity guarantees. Two simultaneous method calls must result in either an error or the expected outcome of each call to be applied successfully.

It also looks like if you have more than 150 records there's an html response that is returned by the namecheap API rather than an XML response so that's something the libdns package should check for and provide a better error message to users.