cloudflare / cloudflared

Cloudflare Tunnel client (formerly Argo Tunnel)
https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/tunnel-guide
Apache License 2.0
9.07k stars 799 forks source link

failure because TXT dns records are sometimes filtered #423

Open Zibri opened 3 years ago

Zibri commented 3 years ago

yesterday it was working poerfectly on ubuntu 18.04 today it fails with this error:

2021-07-28T00:47:28Z INF Requesting new Quick Tunnel...
2021-07-28T00:47:32Z INF +------------------------------------------------------+
2021-07-28T00:47:32Z INF |  Your Quick Tunnel has been created! Visit it at:    |
2021-07-28T00:47:32Z INF |  cool-creativity-petersburg-makes.trycloudflare.com  |
2021-07-28T00:47:32Z INF +------------------------------------------------------+
2021-07-28T00:47:32Z INF Version 2021.7.3
2021-07-28T00:47:32Z INF GOOS: linux, GOVersion: devel +11087322f8 Fri Nov 13 03:04:52 2020 +0100, GoArch: amd64
2021-07-28T00:47:32Z INF Generated Connector ID: a02696fb-d996-4047-b4f5-e860be44bfce
2021-07-28T00:47:32Z INF cloudflared will not automatically update when run from the shell. To enable auto-updates, run cloudflared as a service: https://developers.cloudflare.com/argo-tunnel/reference/service/
2021-07-28T00:47:52Z ERR Couldn't start tunnel error="lookup protocol.argotunnel.com on 127.0.0.53:53: read udp 127.0.0.1:39905->127.0.0.53:53: i/o timeout"
lookup protocol.argotunnel.com on 127.0.0.53:53: read udp 127.0.0.1:39905->127.0.0.53:53: i/o timeout

same goes if I change dns note: the machine is a VM inside my main pc.

on my windows host pc I can do: cloudflared tunnel --url http://192.168.1.104:XXXX

yesterday the same command worked on the guest machine (192.168.1.104) today gives that error.

any clue?

benbalter commented 3 years ago

I receive the same error with 2021.7.3 (with both that protocol.argotunnel.com address and a cloudflare-gateway.com teams address). Downgrading to 2021.7.0 resolves the issue.

Could this be related to TUN-4699: Make quick tunnels the default in cloudflared from 2021.7.1?

I'm running proxy-dns on a Raspberry Pi, which has been running without issue for over a year, and then suddenly broke with ~2021.7.1. Happy to help diagnose.

nmldiegues commented 3 years ago

I receive the same error with 2021.7.3 (with both that protocol.argotunnel.com address and a cloudflare-gateway.com teams address). Downgrading to 2021.7.0 resolves the issue.

Could this be related to TUN-4699: Make quick tunnels the default in cloudflared from 2021.7.1?

I'm running proxy-dns on a Raspberry Pi, which has been running without issue for over a year, and then suddenly broke with ~2021.7.1. Happy to help diagnose.

@benbalter can you show the cloudflared command and config that you are running with that broke with 2021.7.1 onwards?

nmldiegues commented 3 years ago

@Zibri and @benbalter can you run the following command in the environment where cloudflared is failing?

dig -t txt protocol.argotunnel.com
nmldiegues commented 3 years ago

This is the same as https://github.com/cloudflare/cloudflared/issues/388

Zibri commented 3 years ago

dig -t txt protocol.argotunnel.com

it does not return anything and times out. in egypt dns queries are very restricted. perhaps you should do the query using https dns

Zibri commented 3 years ago

SRV queries are not blocked. and a few other types too. so you have 2 choices: or you use an https dns or you try other dns queries as a backup like SRV or SIG, CAA etc etc

Zibri commented 3 years ago

Downgrading to 2021.7.0 resolves the issue.

Thanks for poiting this out. Also to avoid autoupdating, an easy trick is this:

# sed -i "s/2021.7.0/2025.7.0/" $(which cloudflared)

nmldiegues commented 3 years ago

About the lookup TXT problem, we haven't yet addressed, and will soon.

About the "quick tunnel" (i.e., a no-login tunnel) causing the lookup TXT --- that seems to fail on rare situations such as those described here --- we have reverted that logic in 2021.7.4, meaning it will no longer cause that lookup.

benbalter commented 3 years ago

can you show the cloudflared command and config that you are running with that broke with 2021.7.1 onwards?

I have a service defined to run /usr/local/bin/cloudflared --config /etc/cloudflared/config.yml with the following config:

proxy-dns: true
proxy-dns-port: 5053
proxy-dns-upstream:
  - https://XXX.cloudflare-gateway.com/dns-query
proxy-dns-bootstrap:
  - https://1.1.1.2/dns-query

can you run the following command in the environment where cloudflared is failing?

With cloudflared running (2021.7.0), I get the "http2=100" response, presumably as expected.

Before cloudflared bootstraps, the dig query fails, because the system resolver (set to 127.0.0.1#53) uses cloudflared's proxy-dns as it's upstream resolver (127.0.0.1#5053).

perhaps you should do the query using https dns

It seems 2021.7.1's quick channels default introduced a dependency on being able to query that TXT record during the bootstrap process, but does so in a way that uses the system resolver, rather than the designated bootstrap resolver / DNS over HTTPS.

Similar to the discussion in https://github.com/cloudflare/cloudflared/issues/388 and above, on my network, non-DoH DNS queries are blocked entirely, meaning as before 2021.7.1, in order to maintain backwards compatibility, the bootstrap process should allow use of DoH for its initial resolution, not the system resolver.

All that said, thank you for your quick response and for maintaining such a great project! 🎉

sudarshan-reddy commented 3 years ago

With cloudflared running (2021.7.0), I get the "http2=100" response, presumably as expected.

Hi @benbalter! Can you share the stdout logs when you run the same command with v2021.7.0 please?

non-DoH DNS queries are blocked entirely, meaning as before 2021.7.1, in order to maintain backwards compatibility, the bootstrap process should allow use of DoH for its initial resolution, not the system resolver.

This should still happen if you were to use the command cloudflared proxy-dns. Can you try it out with the latest version and let me know if that works for you?

benbalter commented 3 years ago

Can you share the stdout logs when you run the same command with v2021.7.0 please?

Of course. Thanks for the quick reply. Here's the output on 2021.7.0:

$ dig -t txt protocol.argotunnel.com

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> -t txt protocol.argotunnel.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63169
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;protocol.argotunnel.com.   IN  TXT

;; ANSWER SECTION:
protocol.argotunnel.com. 300    IN  TXT "http2=100"

;; Query time: 26 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Jul 28 19:24:24 UTC 2021
;; MSG SIZE  rcvd: 97

And if I were to query cloudflared directly (bypassing the downstream pi-hole DNS server), here's the result:

$ dig -t txt protocol.argotunnel.com @127.0.0.1 -p 5053
; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> -t txt protocol.argotunnel.com @127.0.0.1 -p 5053
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44008
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;protocol.argotunnel.com.   IN  TXT

;; ANSWER SECTION:
protocol.argotunnel.com. 277    IN  TXT "http2=100"

;; Query time: 30 msec
;; SERVER: 127.0.0.1#5053(127.0.0.1)
;; WHEN: Wed Jul 28 19:26:30 UTC 2021
;; MSG SIZE  rcvd: 97
benbalter commented 3 years ago

Can you try it out with the latest version and let me know if that works for you?

The 2021.7.4 bootstraps as expected, both via the cloudflared command + config file and with cloudflared proxy-dns directly

sudarshan-reddy commented 3 years ago

Oops. I misspoke. Can you also do me the favour of trying cloudflared proxy-dns out with 2021.7.3?

Of course. Thanks for the quick reply. Here's the output on 2021.7.0:

Thanks for this. Can you also share the output of your cloudflared command please?

nmldiegues commented 3 years ago

So I think we've understood this a bit better now.

This second case therefore starts a tunnel, besides starting the dns proxy. It's very likely that you are not even using that tunnel at all. So you can just run the first case above and therefore skip the tunnel logic.

The reason why the behaviour changed is because we changed those "account-less tunnels" (where no --hostname is provided, and no tunnel is pre-created with a login) to no longer use our legacy tunnels infrastructure, and use the new one for named tunnels. This new one looks up a TXT record, and that's what you noticed. We will make cloudflared more resilient to the TXT lookup.

nmldiegues commented 3 years ago

We've uncovered that this different behaviour (of running a tunnel next to the proxy-dns) was a regression/accidental recent change due to some bad argument handling. FYI, we will revert that

benbalter commented 3 years ago

So you can just run the first case above and therefore skip the tunnel logic.

Came here to post the stdout requested above, and arrived at a similar conclusion.

That said, I may have found another bug (happy to move this to a new issue, if unrelated), in that either I don't believe cloudflared proxy-dns is using the bootstrap resolver (either specified or default), or I don't understand what the purpose of that setting is (probably more likely).

Output of cloudflared on 2021.7.3: ``` pi@raspberrypi:~ $ cloudflared 2021-07-28T22:36:45Z INF Requesting new Quick Tunnel... failed to request quick tunnel: Post "https://api.trycloudflare.com/tunnel": dial tcp: lookup api.trycloudflare.com on 127.0.0.1:53: read udp 127.0.0.1:46291->127.0.0.1:53: i/o timeout ```
Output of cloudflared proxy-dns with a Cloudflare gateway upstream (duplicate log entries removed) ``` ^Cpi@raspberrypi:~ $ cloudflared proxy-dns --port 5053 --upstream https://XXX.cloudflare-gateway.com/dns-query --bootstrap "https://1.1.1.1/dns-query" 2021-07-28T22:38:58Z INF Adding DNS upstream url=https://XXX.cloudflare-gateway.com/dns-query 2021-07-28T22:38:58Z INF Starting DNS over HTTPS proxy server address=dns://localhost:5053 2021-07-28T22:38:58Z INF Starting metrics server on 127.0.0.1:41525/metrics 2021-07-28T22:39:05Z ERR failed to connect to an HTTPS backend "https://XXX.cloudflare-gateway.com/dns-query" error="failed to perform an HTTPS request: Post \"https://XXX.cloudflare-gateway.com/dns-query\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" 2021-07-28T22:39:10Z ERR failed to connect to an HTTPS backend "https://XXX.cloudflare-gateway.com/dns-query" error="failed to perform an HTTPS request: Post \"https://XXX.cloudflare-gateway.com/dns-query\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" ```

I get similar output for cloudflared proxy-dns on 2021.7.4. As you can see above, in both versions, cloudflared is attempting to resolve the XXX.cloudflare-gateway.com subdomain via the 127.0.0.1#53 resolver, even though the bootstrap resolver is specified in the config (and the default resolver should be Cloudflare's IP). I can also see the XXX.cloudflare-gateway.com requests in my #53 resolver's logs (which uses cloudflared as upstream, resulting in a timeout). cloudflared proxy-dns with no arguments works, as it uses 1.1.1.1 as its upstream.

Is my understanding incorrect in that cloudflared proxy-dns should use the bootstrap resolver to resolve the upstream resolver's domain at startup?

If instead I use the following config (moving 1.1.1.2 to a second upstream), when the first DNS lookup fails, it falls back to 1.1.1.2 (I believe, only for that request. since the first resolver could then be used), and resolves/proxies requests as expected:

proxy-dns: true
proxy-dns-port: 5053
proxy-dns-upstream:
  - https://XXX.cloudflare-gateway.com/dns-query
  - https://1.1.1.2/dns-query

Again, very grateful for your time and thoughtfulness here, and glad to hear that I found at least one bug, and it wasn't entirely my fault. Eager to hear your thoughts on the bootstrap issue, and again, if unrelated, happy to move it to a new issue. Thanks again!

benbalter commented 3 years ago

if you run cloudflared proxy-dns --config ...

One minor note, in case it impacts the above, cloudflared takes a config argument, but it does not appear proxy-dns does.

Placing the --config argument after proxy-dns results in Incorrect Usage: flag provided but not defined: -config and placing it before results in the command succeeding, but with the config ignored.

To be clear, I'm not seeking to complain (easy enough to pass as command line vars), but wanted to share in case the change in behavior was helpful.

sudarshan-reddy commented 3 years ago

Sorry for getting back a bit late here guys. These issues should be fixed in the newest release. Give it a go and let us know what you think.