dehydrated-io / dehydrated

letsencrypt/acme client implemented as a shell-script – just add water
https://dehydrated.io
MIT License
5.96k stars 716 forks source link

JWS has invalid anti-replay nonce #547

Closed legsak1mbo closed 3 years ago

legsak1mbo commented 6 years ago

I'm getting this error with increasing frequency. The response looks something like:-

{"type":"urn:acme:error:badNonce","detail":"JWS has invalid anti-replay nonce **NONCE**","status":400}

The only "fix" appears to be re-running dehydrated, sometimes several times, until it succeeds.

In https://github.com/diafygi/gethttpsforfree/issues/150#issuecomment-380361381 they suggest that "nonce timeouts are becoming more common". I assume that's what I'm seeing here too?

lukas2511 commented 6 years ago

Mh that is weird, I never encountered that issue... A nonce timeout seems unlikely to me as dehydrated retrieves a nonce for every signed request and immediately uses it.

If you encounter this issue often you could help me identify the issue by adding an echo "Nonce: ${nonce}" >&2 after the whole code-block marked as # Retrieve nonce from acme-server. If it happens again compare the nonce to the previous working ones and see if it is somehow shorter or looks completely different, has a special character the other nonces don't have, whatever it could be.

CliffS commented 6 years ago

I am also seeing this. Example output:

Processing demidamson.org with alternative names: *.demidamson.org
 + Checking domain name(s) of existing cert... changed!
 + Domain name(s) are not matching!
 + Names in old certificate: cliff.demidamson.org demidamson.org mail.demidamson.org proof.demidamson.org www.demidamson.org
 + Configured names: *.demidamson.org demidamson.org
 + Forcing renew.
 + Checking expire date of existing cert...
 + Valid till Jun  5 18:02:05 2018 GMT (Longer than 30 days). Ignoring because renew was forced!
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
Nonce: kSCHApAbf9-DzROJ2tDz5PHWQ6415Wt_NjqPB3tw41E
 + Received 2 authorizations URLs from the CA
 + Handling authorization for demidamson.org
 + Handling authorization for demidamson.org
 + 2 pending challenge(s)
 + Deploying challenge tokens...
 + Responding to challenge for demidamson.org authorization...
Nonce: 24xRxskZFAvH6TLV8h9E0xgE82Ozx3GeTbIizm0Gt0A
 + Challenge is valid!
 + Responding to challenge for demidamson.org authorization...
Nonce: 274clr42IKJTC3MUNPPdaxVaBGJDK88AknJ_Yv8K7PM
 + Challenge is valid!
 + Cleaning challenge tokens...
 + Requesting certificate...
Nonce: LhkvE7j-vosfFHtkjQ5EVFIZhuH5DAaBtMKL0-LULwU
  + ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/finalize/9161467/3737048 (Status 400)

Details:
HTTP/1.1 100 Continue
Expires: Thu, 26 Apr 2018 17:10:25 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

HTTP/1.1 400 Bad Request
Server: nginx
Content-Type: application/problem+json
Content-Length: 169
Boulder-Requester: 9161467
Replay-Nonce: 2Gb7gCvUgTvEc1kbUca5bZpiLj5iNdecPXPMoY9GDYQ
Expires: Thu, 26 Apr 2018 17:10:25 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Thu, 26 Apr 2018 17:10:25 GMT
Connection: close

{
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has an invalid anti-replay nonce: \"LhkvE7j-vosfFHtkjQ5EVFIZhuH5DAaBtMKL0-LULwU\"",
  "status": 400
}

Request failure: 400 {
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has an invalid anti-replay nonce: \"LhkvE7j-vosfFHtkjQ5EVFIZhuH5DAaBtMKL0-LULwU\"",
  "status": 400
}
CliffS commented 6 years ago

In case it helps, my config file contains:

CHALLENGETYPE="dns-01"
WELLKNOWN="${BASEDIR}/wellknown"
HOOK="${BASEDIR}/hook2.pl"
HOOK_CHAIN="yes"
CONTACT_EMAIL="cliff@might.be"
lukas2511 commented 6 years ago

This is a really hard issue to debug... I'm running dehydrated in a loop right now but I'm not able to get it to fail even once. I'll try to implement retries using the nonce send back by the server, but it's really really hard for me to test as I just can't get it to fail.

legsak1mbo commented 6 years ago

OK, so it fails first time on every machine I'm trying it on.

Output (with echo) is below...

Nonce: UawgckMjzaYcfL8ic2bQRQLjBs0VKqAhzY7GOxSwvzc
Nonce: 3fYLEt90kzqRapD-wg3VTmnMufWboHwO4ux0Iid0Qgg
Nonce: Bv-Juf5kr281OAdl_bR9NcwRYKyPdxQxEy0tsNbeqg0
  + ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/finalize/5474087/3738618 (Status 400)

Details:
HTTP/1.1 100 Continue
Expires: Thu, 26 Apr 2018 17:41:29 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

HTTP/1.1 400 Bad Request
Server: nginx
Content-Type: application/problem+json
Content-Length: 169
Boulder-Requester: 5474087
Replay-Nonce: 6bYt73P6LBAu0R2dNcCxJ7xBJCOK2B_sVIPdP4d0h2M
Expires: Thu, 26 Apr 2018 17:41:30 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Thu, 26 Apr 2018 17:41:30 GMT
Connection: close

{
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has an invalid anti-replay nonce: \"Bv-Juf5kr281OAdl_bR9NcwRYKyPdxQxEy0tsNbeqg0\"",
  "status": 400
}
lukas2511 commented 6 years ago

@CliffS @legsak1mbo do you by any chance have multiple egress ip addresses or a dual stack (ipv4+ipv6) setup? can you verify if the issue goes away if you set IP_VERSION=6 in your config file?

legsak1mbo commented 6 years ago

Not using multiple egress addresses or IPv6 here.

lukas2511 commented 6 years ago

@legsak1mbo are you sure? no NAT or something that could result in the request coming from a different IP? that's basically the only way I'm able to reproduce this issue.

legsak1mbo commented 6 years ago

I don't believe so. Certainly nothing that would change between requests in the same run.

lukas2511 commented 6 years ago

@legsak1mbo can you do a few curl https://my-ipv4.kurz.pw requests and see if the result changes between runs? just to make sure.

legsak1mbo commented 6 years ago

Well heck, it certainly does!

Time to get on the phone to my ISP...

lukas2511 commented 6 years ago

@legsak1mbo yea.. meh. in that case even retries wouldn't do you any good as it would be basically luck-based if the request goes through cleanly...

CliffS commented 6 years ago

@lukas2511 Fixing it to IPv6 appears to have solved the problem for me.

smortex commented 6 years ago

We encountered the same problem today. It appears that a customer has changed the DNS configuration of one of the domains of the certificate failing to renew to a previous configuration where the A record was the IP address of an old shared hosting by OVH and no AAAA record was set.

Because OVH can do TLS for shared hosting through letsencrypt, my guess is that when the letsencrypt validation server tries to fetch a token it gets one from OVH (maybe an old one, and of course it's not what the validation server expects so the renewal fail… and the "invalid anti-replay nonce" message makes sense).

@CliffS Maye it's worth double-checking that both IPv4 and IPv6 resolves to the same server for your domain: that would explain why renewing over IPv4 would work if accessing through IPv6 brings you somewhere else?

CliffS commented 6 years ago

@smortex Interestingly there was no reverse DNS for the IPv6 address, though IPv4 reverse was correct. I have fixed the IPv6 reverse and I will retest without forcing the IPv6.

smortex commented 6 years ago

@CliffS I don't think reverse DNS has an impact here. I was thinking about the IPv4 address and the IPv6 address not being served by the same machine.

Just like for example http://www.kame.net/ is not the same site over IPv4 and IPv6… Static image in one case, animated gif otherwise 😉

AceSlash commented 6 years ago

I had this issue today on a certificate with 6 alternative names: it was failing randomly on one of them.

After talking about it on irc with @lukas2511 and reading this thread, setting IP_VERSION=6 did indeed fix the issue for me. The server in question has 2 IPv4 addresses and 1 IPv6 address, but never had the issue before.

Checking with curl https://my-ipv4.kurz.pw, I always see the same IPv4 address, so I don't think it flickers.

I'll try to test that by creating a certificate with a lot of alternative name and run tcpdump to capture the result and see what exactly is going on.

major commented 6 years ago

I was having this same problem today and found that setting IP_VERSION=4 fixed the issue. My laptop has an IPv4 and IPv6 address.

lukavia commented 6 years ago

I had this issue today, but unfortunately the ISP won't change its behavior. @lukas2511 looking around the forums at LetsEncrypt there was a suggestion that the client retry the request with the nonce from the response a few(reasonable) times before giving up. Would it be hard to implement this in dehydrated?

lukas2511 commented 6 years ago

@lukavia i want to implement two things in the not-too-far future:

I want to try to find a way to resolve the api hostname only once, so that every further curl call uses the same server, this will solve this bug.

I also want to add retries, but those are a lower priority for me as use-cases with hundreds of domains per single certificate are low and everything else can quickly be solved by just running the script again.

lukavia commented 6 years ago

@lukas2511 Unfortunately even if you resolve the api hostname only once, the problem will persist since the provider route would still be different every time. Here is an example: I have 2 internet providers with pfSense router installed. pfSense is making load balance one on one. That means that on every request to the same ip each time it goes through the other provider and the originator (your) ip is different.

So I think retry feature should be higher priority.

P.S. Since you use curl, you can just make "host" command the first time to get the ip, and then use the ip with Host header for each request

olivluca commented 6 years ago

@lukavia you can simply program a pfsense rule to route traffic to letsencrypt through one provider (i.e do not load balance it). That's what I did when I I had the same issue.

lukavia commented 6 years ago

This was an example of the problem. When the provider does the same you don't have access to those settings.

bohwaz commented 6 years ago

I also have the same issue, it seems to happen randomly, I have to launch dehydrated in a loop until it succeeds…

mcv21 commented 6 years ago

We're also seeing this, probably related to being behind NAT (so outgoing IP changes all the time). It's not clear to me why this should matter - is the source IP encoded in the Nonce somehow, or is it stateless at the server and and you're just getting a different remote server each time? In any case, having dehydrated be able to retry with the new Nonce each time would be better, but perhaps this is a problem with Boulder itself?

yverry commented 6 years ago

From my side when I've added IP_VERSION=6 nonce error disappeared

neoKushan commented 5 years ago

Just to chime in, as I encountered this issue on a completely unrelated system and Googling brought me here.

This issue is essentially caused by LE being unable to get the ACME challenge from the specified domain name. It's clearly not as simple as DNS not being set up correctly, as it's more nuanced than this.

A lot of the people in this thread have found out that when you have multiple IP addresses, they don't always route to the same endpoint. Likewise if you're on a shared IP of any kind, there's no way to guarantee that you'll get the right host either. This is why a lot of people setting IP_VERSION=6 or IP_VERSION=4 "fixes" the issue, it's simply removing the "other" IP Addresses. Essentially, it boils down to your local configuration/network/setup and that's why there's no single thing that will "fix" it.

In my case, IP addresses weren't the issue but rather a redirect was redirecting .well-known incorrectly, causing it to return a 200 with content, just not the content of the ACME challenge - hence "bad nonce". Had it returned a 404, you'd have got the much more useful error that contains the link to the renewal failure report.

I was able to figure this out by simply trying to navigate to .com/.well-known/acme-challenge/ - it should return the nonce directly and not anything else.

To sum up, if you're getting this error:

znerol commented 5 years ago

I'm running dehydrated as part of an integration-test on Travis. I did run into this issue since some of their test workers are behind a NAT. First thing i tried was to find forward proxy software which implements connection pooling/reuse for HTTPS. The closest thing I've found is some adventurous nginx/lua approach.

I ended up tunneling all curl requests through tor when running the dehydrated test on Travis. This might not be acceptable in production though.

staples1347 commented 5 years ago

I am getting this error quite a lot with IPv6. My server is using the same static IP when sending, but I noticed with tcpdump curl seems to alternate between two destinations for acme-v02.api.letsencrypt.org: 2600:1415:8:185::3a8e , 2600:1415:8:192::3a8e which may be causing problems if the backend servers aren't synchronising properly. If I put in a single entry in my hosts file, I don't seem to get the error as often. IPv4 is reliable, but dns normally only returns one ip address for acme-v02.api.letsencrypt.org. Should I report this to Let's Encrypt since using the new nonce might be invalidated when curl connects to the other remote server again?

staples1347 commented 5 years ago

Actually my problem might just be with my connection as when I try on other Linux servers using the same two destinations I'm not getting the error.

alexzorin commented 4 years ago

ACME clients are supposed to transparently retry requests that fail due to an invalid nonce. This is explicitly mentioned in the spec (https://tools.ietf.org/html/rfc8555#section-6.5):

When a server rejects a request because its nonce value was unacceptable (or not present), it MUST provide HTTP status code 400 (Bad Request), and indicate the ACME error type "urn:ietf:params:acme:error:badNonce". An error response with the "badNonce" error type MUST include a Replay-Nonce header field with a fresh nonce that the server will accept in a retry of the original query (and possibly in other requests, according to the server's nonce scoping policy). On receiving such a response, a client SHOULD retry the request using the new nonce.

Whether or not this is caused by NAT, multiple IP addresses, or server-side goings on, users should not even notice that it is happening. You can look at clients like acme.sh or Certbot to see how they handle this.

m-a-v commented 4 years ago

I've had the same problem. After updating from 0.6.3 to 0.6.5 the problem disappeared. Probably this helps someone else.

altasnet commented 4 years ago

I'm getting error when I try to register and accept terms. I've already check that I'm using just one IP address. I'm using version 0.6.5

dehydrated configuration

INFO: Using main config file /shared/dehydrated/config

declare -- CA="https://acme-staging.api.letsencrypt.org/directory" declare -- CERTDIR="/shared/dehydrated/certs" declare -- ALPNCERTDIR="/shared/dehydrated/alpn-certs" declare -- CHALLENGETYPE="http-01" declare -- DOMAINS_D="" declare -- DOMAINS_TXT="/shared/dehydrated/domains.txt" declare -- HOOK="/shared/dehydrated/hook.sh" declare -- HOOK_CHAIN="no" declare -- RENEW_DAYS="30" declare -- KEYSIZE="2048" declare -- WELLKNOWN="/shared/dehydrated" declare -- PRIVATE_KEY_RENEW="yes" declare -- OPENSSL_CNF="/etc/pki/tls/openssl.cnf" declare -- CONTACT_EMAIL="" declare -- LOCKFILE="/shared/dehydrated/lock"

INFO: Using main config file /shared/dehydrated/config

Dehydrated by Lukas Schauer https://dehydrated.io

Dehydrated version: 0.6.5 GIT-Revision: unknown

OS: BIG-IP 14.1.0.3 Build 0.0.6 Used software: bash: 4.2.46(2)-release curl: curl 7.47.1 awk: GNU Awk 4.0.2 sed: sed (GNU sed) 4.2.2 mktemp: mktemp (GNU coreutils) 8.22 grep: grep (GNU grep) 2.20 diff: diff (GNU diffutils) 3.3 openssl: OpenSSL 1.0.2p-fips 14 Aug 2018

INFO: Using main config file /shared/dehydrated/config

Details: HTTP/2.0 400 server:nginx date:Wed, 02 Oct 2019 21:15:32 GMT content-type:application/problem+json content-length:100 cache-control:public, max-age=0, no-cache replay-nonce:0002_xoIBQHkeneUKKhLjCGvLu2pNl-Me7aP-dTwuVkTtBU

{ "type": "urn:acme:error:badNonce", "detail": "JWS has no anti-replay nonce", "status": 400 }

Error registering account key. See message above for more information.

KamilKeski commented 4 years ago

@altasnet Noticed you arent declaring "CA_TERMS", that's going to be required for the correct environment (staging or Prod) to register a new account. You are using the Staging cert authority, may be defaulting to prod license terms if not defined and generating an invalid nonce for that reason. Just a thought.

declare -- CA_TERMS="https://acme-staging.api.letsencrypt.org/terms"

altasnet commented 4 years ago

@altasnet Noticed you arent declaring "CA_TERMS", that's going to be required for the correct environment (staging or Prod) to register a new account. You are using the Staging cert authority, may be defaulting to prod license terms if not defined and generating an invalid nonce for that reason. Just a thought.

declare -- CA_TERMS="https://acme-staging.api.letsencrypt.org/terms"

Thank you for your time!

We get the same error in production:

Details: HTTP/2.0 400 server:nginx date:Thu, 03 Oct 2019 13:04:33 GMT content-type:application/problem+json content-length:112 cache-control:public, max-age=0, no-cache link:https://acme-v02.api.letsencrypt.org/directory;rel="index" replay-nonce:0002JNQGAJOKMNtHGIDom3Mth9pEqsTPh7C3_zivlpEyN2k

{ "type": "urn:ietf:params:acme:error:badNonce", "detail": "JWS has no anti-replay nonce", "status": 400 }

Chupaka commented 4 years ago

@altasnet just a note: "JWS has no anti-replay nonce" and "JWS has invalid anti-replay nonce" are different errors.

altasnet commented 4 years ago

@altasnet just a note: "JWS has no anti-replay nonce" and "JWS has invalid anti-replay nonce" are different errors.

I didnt get invalid anti-replay, its always no anti-replay.

Do you have any idea what could it be?

javimox commented 4 years ago

Same here:

Details:
HTTP/1.1 200 Connection established

HTTP/1.1 400 Bad Request
Server: nginx
Date: Fri, 04 Oct 2019 11:23:43 GMT
Content-Type: application/problem+json
Content-Length: 112
Connection: keep-alive
Boulder-Requester: 46078664
Cache-Control: public, max-age=0, no-cache
Link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index"
Replay-Nonce: 00029Vzf0xEXudqikduz93gYlcO2Dg-cWv9FsO32GN44xyA

{
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has no anti-replay nonce",
  "status": 400
}
timdev commented 4 years ago

FWIW, I encountered "JWS has no anti-replay nonce" today. Eventually stumbled upon this thread, and solved the issue on my machine by adding CURL_OPTS="--http1.1" to my dehydrated config file.

lukas2511 commented 3 years ago

This may be magically "fixed" when dehydrated at some points gets retry logic. Until then please just fix your network configuration.