Closed eroglyph closed 6 years ago
No :)
I am also running dnscrypt-proxy on a Raspi 3 (in a Docker container, though) and have not observed this issue.
What I observed, though, is a really hefty latency penalty of ~ 300 ms for the initial TLS handshake for DoH. Will open a separate issue for this.
@eroglyph Unfortunately, this is something I've never observed, and the strace dump doesn't convey any useful information.
Can you try to find a reliable way to reproduce this and then reopen this ticket? Or find if disabling a specific feature works around this?
I'll see what I can do. I coudn't find anything in the logs to indicate what was going on, but I restarted dnsmasq last night and haven been stable so far.
@jedisct1 I've got more log info and I'm starting to understand more of what's going on. My wife and I have work VPNs that we're required to use when we work from home. I can get dnscrypt-proxy to consistently fail when we connect to the VPN. I'm thinking that these connections are making more dns requests than dnscrypt-proxy can handle. But even after disconnecting those machines, the process remains locked-out. I've got the dnscrypt-proxy.log below (log level 0)
[2018-04-04 02:38:45] [NOTICE] Source [public-resolvers.md] loaded [2018-04-04 02:38:45] [NOTICE] dnscrypt-proxy 2.0.8 [2018-04-04 02:38:46] [NOTICE] Now listening to 127.0.0.1:40000 [UDP] [2018-04-04 02:38:46] [NOTICE] Now listening to 127.0.0.1:40000 [TCP] [2018-04-04 02:38:46] [NOTICE] Now listening to [::1]:40000 [UDP] [2018-04-04 02:38:46] [NOTICE] Now listening to [::1]:40000 [TCP] [2018-04-04 02:38:51] [ERROR] Get https://dns.cloudflare.com/dns-query?body=yv4BAAABAAAAAAABAAACAAEAACkQAAAAgAAAAA&ct=&dns=yv4BAAABAAAAAAABAAACAAEAACkQAAAAgAAAAA&random_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) [2018-04-04 02:38:51] [NOTICE] dnscrypt-proxy is waiting for at least one server to be reachable [2018-04-04 02:39:02] [NOTICE] [cloudflare] OK (DoH) - rtt: 23ms [2018-04-04 02:39:02] [NOTICE] Server with the lowest initial latency: cloudflare (rtt: 23ms) [2018-04-04 06:39:03] [INFO] [cloudflare] OK (DoH) - rtt: 26ms [2018-04-04 06:39:03] [INFO] [cloudflare] OK (DoH) - rtt: 26ms [2018-04-04 06:39:03] [NOTICE] Server with the lowest initial latency: cloudflare (rtt: 26ms) [2018-04-04 10:39:03] [INFO] [cloudflare] OK (DoH) - rtt: 25ms [2018-04-04 10:39:04] [INFO] [cloudflare] OK (DoH) - rtt: 19ms [2018-04-04 10:39:04] [NOTICE] Server with the lowest initial latency: cloudflare (rtt: 19ms) [2018-04-04 13:40:40] [INFO] Server [cloudflare] returned error code [2] -- It might be experiencing upstream connectivity issues [2018-04-04 13:40:40] [INFO] Server [cloudflare] returned error code [2] -- It might be experiencing upstream connectivity issues [2018-04-04 13:40:40] [INFO] Server [cloudflare] returned error code [2] -- It might be experiencing upstream connectivity issues [2018-04-04 13:40:40] [INFO] Server [cloudflare] returned error code [2] -- It might be experiencing upstream connectivity issues [2018-04-04 13:44:16] [WARNING] Too many connections (max=250) [2018-04-04 13:44:16] [WARNING] Too many connections (max=250) [2018-04-04 13:44:16] [WARNING] Too many connections (max=250) [2018-04-04 13:44:16] [WARNING] Too many connections (max=250) [2018-04-04 13:47:27] [NOTICE] Stopped.
@jedisct1 It doesn't look like I can reopen this issue. Should I create a new thread instead?
[2018-04-04 13:40:40] [INFO] Server [cloudflare] returned error code [2] -- It might be experiencing upstream connectivity issues
This may be the root cause.
We got proper responses from the server, but 2
stands for SERVFAIL
, i.e. the server responded but with an error.
I can reliably reproduce this just by sending more than 3 queries in a row on the same connection.
The number of maximum connections configured in the proxy is unrelated.
I suspect Cloudflare to be heavily rate limiting, or to enforce a cap of 3 in-flight queries on a single connection.
Just reported this to a friend working on Cloudflare DNS.
dnscrypt-proxy
closes the connection and temporarily switches to other servers (if there are more configured) when it starts receiving this kind of error.
It should probably wait a little bit before overreacting to that kind of response code, especially since this doesn't necessarily implies that the server is overloaded.
Gonna try to improve this for the next version.
Meanwhile, in proxy.go
, you can replace
if rcode := Rcode(response); rcode == 2 || rcode == 5 { // SERVFAIL // REFUSED
with
if rcode := Rcode(response); rcode == 5 { // REFUSED
Modifying proxy.go
with your fix did help a little. It no longer spikes on my wife's VPN, but I can still get it to spike using mine. It will go back down to 0% usage a few minutes after I disconnect. I'm still unable to resolve queries during the CPU spike.
I wonder if it is related to #315 in that the HTTPS overhead is too high for the Pi. I don't have any CPU or resolving issues if I forgo using dnscrypt-proxy
@ginkel I just tested with the ev-us2
server which is connecting with crypto v2 and saw no CPU/latency spike so I do suspect something going on with either Cloudflare or DoH.
ev-us2
uses the DNSCrypt protocol, which is always faster and requires less CPU than DoH.
With Cloudflare, it will be mitigated once Go merges support for TLS 1.3, that supports 0RTT reconnections.
There is something new for you in the configuration file:
## DoH: Use a specific cipher suite
## 49199 = TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
## 49195 = TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
## 52392 = TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
## 52393 = TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
### On non-Intel systems such as Raspberry Pi and other ARM systems, the following is recommended
### Everybody else should not set this parameter, and just use the default suite.
# tls_cipher_suite = [52392, 49199]
Same here with dnscrypt-proxy 2.0.9b2
[INFO] Server [cloudflare] returned error code [2] -- It might be experiencing upstream connectivity issues [INFO] Server [cloudflare-ipv6] returned error code [2] -- It might be experiencing upstream connectivity issues
or
[INFO] Server [d0wn-fr-ns1] returned error code [2] -- It might be experiencing upstream connectivity issues
Running version 2.0.8
After about an hour or so of running perfectly on my raspberry pi with pihole, the dnscrypt-proxy process will spike to 100% cpu and refuse to resolve addresses until I restart the service. I ran an strace and see these messages scrolling rapidly:
Any idea why this might be happening?