Closed mschirrmeister closed 3 years ago
New keys should be fetched when a 401
response is received: https://github.com/DNSCrypt/dnscrypt-proxy/blob/796a7f6d31737adf352a4711b9983acc90c72daa/dnscrypt-proxy/proxy.go#L705 but maybe this doesn't work as expected.
Did you get a Forcing key update
log entry?
/cc @chris-wood
I never get anything else than [WARNING] Failed to receive successful response from [odohtarget-ms]
. This is with loglevel 0
.
You can reproduce it faster by restarting the target, instead of waiting for the key rotate. As soon as the target is restarted, dnscrypt-proxy fails and the message above is logged.
Just a thought: does the relay forward the 401
response? What relay are you using?
I was able to reproduce this, thanks a lot for describe a quick way to do it!
When the key is wrong, responses received via a relay have a 200
return code and an empty content.
Something like this:
diff --git a/dnscrypt-proxy/proxy.go b/dnscrypt-proxy/proxy.go
index cd2e76a0..9a55da2a 100644
--- a/dnscrypt-proxy/proxy.go
+++ b/dnscrypt-proxy/proxy.go
@@ -702,7 +702,7 @@ func (proxy *Proxy) processIncomingQuery(clientProto string, serverProto string,
dlog.Warnf("Failed to decrypt response from [%v]", serverName)
response = nil
}
- } else if responseCode == 401 {
+ } else if responseCode == 401 || (len(responseBody) == 0 && responseCode == 200) {
dlog.Infof("Forcing key update for [%v]", serverInfo.Name)
for _, registeredServer := range proxy.serversInfo.registeredServers {
if registeredServer.name == serverInfo.Name {
works around the issue, effectively updating the key. But we're suppose get a 401
no matter what.
When using odoh-relay-crypto-sx
as a relay responses get a proper 401
code. When using odoh-relay-surf
and odohrelay-ms
, the code is 200
.
So my guess is that the service running on these relays doesn't properly forward the HTTP status code.
dnscrypt-proxy
also didn't properly store the HTTP status code on error. This is now fixed by cedd4f3b5462195930a52c574d8cab62499fd926
With that fix, along with a relay that properly forwards response codes, everything works as expected.
026c42424fac29f22fcc993e0f65f43aab525511 has a workaround for relays not correctly forwarding the status code.
Hope this is temporary.
I'll release the code of my ODoH relay soon (this is currently running as a WebAssembly module, a proper web server needs to be added in order to have it work as as standalone app).
@jedisct1 I am using Cloudflares odoh-server-go as a relay. It was the only relay code I found, that I can run myself for testing.
I did not check what status code I received from the relay. But looks like you have been able to test this now already. I have everything running in a container at the moment and capturing packets was not that easy. If needed again I should be able to do it, since I have everything behind a Traefik proxy, where it is plain text on the hosts itself.
Does dnscrypt-proxy has an option to dump TLS session keys for debugging? Similar to some browsers which support the variable SSLKEYLOGFILE
.
@chris-wood is this a known issue in doh-server-go? I can look at it if you don't have the time to.
There are no options to dump TLS keys. But looks like this is a very simple thing to do in Go. So maybe that can be added if that proves to be useful more often than once.
Thank you for reporting this, as well as explaining how to reproduce it. That was very useful! Thanks!
Hello @jedisct1,
first of all thank you very much for fixing it so quickly. Amazing. :-)
I did a test and it works fine. Now I can see a key update message.
[2021-08-01 18:57:56] [INFO] Forcing key update for [odohtarget-ms]
With what relay did you the test that you saw the 200
status code and not a 401
?
I do not see the message from commit 026c424. Which means the Cloudflare odoh-server-go forwards the status code correctly. I also wanted to see what it returns and captured traffic on my Traefik container to see what it gets back. The response from the odoh-server-go (relay) was a 401
.
To rule out that Traefik does not do anything with the packets, I configured a little debug environment with HTTP in between to capture the traffic that leaves Traefik.
I created the following locally. dnscrypt-proxy --443--> traefik1 --80--> traefik2 --4567(http)--> odoh-server-go-relay
.
Captured traffic on traefik1 and traefik2 shows the correct status code 401
, which I guess rules out that the odoh-server-go has a problem with relaying a 401.
Here is the captured data result.
traefik1
marco@dust /tmp> tshark -r t1.pcap -Y http
193 24.641678 172.17.0.3 → 172.17.0.4 HTTP 204 2 POST /proxy?body_hash=e93bf4bbdad086e927fcb3cc12f08c9b6dbb93c7bc7dc153047389755b4fe975&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
199 24.678825 172.17.0.4 → 172.17.0.3 HTTP 239 2 HTTP/1.1 401 Unauthorized (text/plain)
207 28.044518 172.17.0.3 → 172.17.0.4 HTTP 203 2 POST /proxy?body_hash=9c9253fd605554000afc8214546d636013c9c982ad3aa941ed63aadc189198bd&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
211 28.062317 172.17.0.4 → 172.17.0.3 HTTP 749 2 HTTP/1.1 200 OK (application/oblivious-dns-message)
219 28.065993 172.17.0.3 → 172.17.0.4 HTTP 234 2 POST /proxy?body_hash=847f9a632f690c10c4e757826aec0706037c31d3164339bf00be97989be530ca&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
222 28.077862 172.17.0.4 → 172.17.0.3 HTTP 429 2 HTTP/1.1 200 OK (application/oblivious-dns-message)
traefik2
marco@dust /tmp> tshark -r t2.pcap -Y http
174 20.833539 172.17.0.3 → 172.17.0.4 HTTP 204 1 POST /proxy?body_hash=e93bf4bbdad086e927fcb3cc12f08c9b6dbb93c7bc7dc153047389755b4fe975&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
178 20.834559 172.17.0.4 → 172.17.0.5 HTTP 204 0 POST /proxy?body_hash=e93bf4bbdad086e927fcb3cc12f08c9b6dbb93c7bc7dc153047389755b4fe975&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
180 20.868819 172.17.0.5 → 172.17.0.4 HTTP 239 0 HTTP/1.1 401 Unauthorized (text/plain)
182 20.870549 172.17.0.4 → 172.17.0.3 HTTP 239 1 HTTP/1.1 401 Unauthorized (text/plain)
187 24.236501 172.17.0.3 → 172.17.0.4 HTTP 203 1 POST /proxy?body_hash=9c9253fd605554000afc8214546d636013c9c982ad3aa941ed63aadc189198bd&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
190 24.237832 172.17.0.4 → 172.17.0.5 HTTP 203 0 POST /proxy?body_hash=9c9253fd605554000afc8214546d636013c9c982ad3aa941ed63aadc189198bd&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
192 24.253582 172.17.0.5 → 172.17.0.4 HTTP 749 0 HTTP/1.1 200 OK (application/oblivious-dns-message)
194 24.254149 172.17.0.4 → 172.17.0.3 HTTP 749 1 HTTP/1.1 200 OK (application/oblivious-dns-message)
197 24.257860 172.17.0.3 → 172.17.0.4 HTTP 234 1 POST /proxy?body_hash=847f9a632f690c10c4e757826aec0706037c31d3164339bf00be97989be530ca&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
200 24.258563 172.17.0.4 → 172.17.0.5 HTTP 234 0 POST /proxy?body_hash=847f9a632f690c10c4e757826aec0706037c31d3164339bf00be97989be530ca&targethost=doh.marco.cx&targetpath=%2Fdns-query HTTP/1.1 (application/oblivious-dns-message)
202 24.269319 172.17.0.5 → 172.17.0.4 HTTP 429 0 HTTP/1.1 200 OK (application/oblivious-dns-message)
204 24.269699 172.17.0.4 → 172.17.0.3 HTTP 429 1 HTTP/1.1 200 OK (application/oblivious-dns-message)
It is best visible on traefik2 capture. tcp.stream 1
is from t1 to t2 (frame 174 and 182) and tcp stream 0
(frame 178 and 180) is from t2 to odoh-server-go.
Frames 187+ is from when dnscrypt-proxy retries it after they key update, I guess. (Sent only one single dns request via dig)
I got the 200
response code with odohrelay-surf
and your relay.
Maybe there are specific scenarios or requests when it returns a 200. Good question why I see only the 401
.
Maybe I'm just confused and your relay was fine. The surf
one was returning 200
for sure.
There was another issue, addressed in that code change, that may have affected responses received from your relay.
Anyway, everything is pretty clear now, and I'm going to leave the workaround until it has been confirmed that all relays properly return 401
.
Thank you very much again. Ones your relay code is available I am happy to test and play around with it too.
Output of the following commands:
./dnscrypt-proxy -version
./dnscrypt-proxy -check
./dnscrypt-proxy -resolve example.com
What is affected by this bug?
The ODoH resolution is affected.
When does this occur?
I happens after dnscrypt-proxy is running for a while. I have no exact times yet, but since I think it is related to the key rotation of the ODoH target, it will happen after whatever time the ODoH target rotates the key.
Where does it happen?
It happens for every query that is not cached where a ODoH route will be used.
How do we replicate the issue?
Expected behavior (i.e. solution)
dnscrypt-proxy should be able refresh odoh keys to continue resolving queries via ODoH after servers rotated their keys.
Other Comments
I am testing ODoH and was building a POC environment. For this I am using the doh-server as an ODoH target and dnscrypt-proxy as a local client. The dnscrypt-proxy config has only an ODoH configuration. After some time name resolution stopped working. After debugging I think the problem is related to the key rotation of the ODoH target. In the doh-server code I noticed
ODOH_KEY_ROTATION_SECS = 86400
.This 24 hours matches the time after it stopped working today. While name resolution via dnscrypt-proxy was not working, I tested ODoH via a separate ODoH client which pulls for every query the config from the target server first and this name resolution worked just fine.
I did a packet capture on the odoh target and for a request via dnscrypt-proxy I can see the following HTTP response.
In the dnscrypt-proxy log I can see the following.
Locally where dnscrypt-proxy runs, I never see a connection attempt to the ODoH target. It always keeps only creating new connections to the ODoH relay. Thats why I think it is related to a no longer valid ODoH key.
My dnscrypt-proxy configuration looks like this.