folbricht / routedns

DNS stub resolver, proxy and router with support for DoT, DoH, DoQ, and DTLS
BSD 3-Clause "New" or "Revised" License
466 stars 62 forks source link

Adding ecs is not suitable for public network vps servers #383

Closed liang-hiwin closed 3 months ago

liang-hiwin commented 4 months ago

From the documentation, we know that when add is used to add ecs and ecs-address does not write anything, the client's IP is used to make the request.

I need to change to when ecs-address does not write anything, use the client's ecs subnet ip to make requests

add - Add an ECS option to a query. If there is one already it is replaced. If no ecs-address is provided, the address of the client is used (with ecs-prefix4 or ecs-prefix6 applied).
folbricht commented 4 months ago

Not sure I understand the ask. Can you provide more details or an example? Is this about using a specific interface to send the DNS requests, or about adding something to the ECS record?

liang-hiwin commented 4 months ago

Not sure I understand the ask. Can you provide more details or an example? Is this about using a specific interface to send the DNS requests, or about adding something to the ECS record?

For example, use dig to attach ecs for testing dig a +subnet=211.139.5.0/24 @127.0.0.1 -p 5500 www.taobao.com Normally, the results obtained by the above command should be 117.187.7.190, 117.187.7.189, but the results obtained through testing are the results near my server.

liang-hiwin commented 4 months ago

The normal value of CLIENT-SUBNET should not be 127.0.0.0/24/24, but 211.139.5.0/24

image

liang-hiwin commented 4 months ago

The reproduction method is very simple. You use https://dns.google/dns-query (8.8.8.8) as the upstream of routedns, and then use dig a +subnet=211.139.5.0/24 @127.0.0.1 -p 5500 www.taobao .com, where 5500 is the listening port of routedns. You will find that the result of the parsing is different from the result of directly requesting https://dns.google/query?name=www.taobao.com&rr_type=A&ecs=211.139.5.0%2F24.

folbricht commented 4 months ago

Can you show me your config as well? Or at least the ECS part of it.

When I test with a plain config I'm getting the expected results.

$ dig a +subnet=211.139.5.0/24 @127.0.0.1 -p 1153 www.taobao.com

; <<>> DiG 9.18.24 <<>> a +subnet @127.0.0.1 -p 1153 www.taobao.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20275
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; CLIENT-SUBNET: 211.139.5.0/24/24
;; QUESTION SECTION:
;www.taobao.com.            IN  A

;; ANSWER SECTION:
www.taobao.com.     600 IN  CNAME   www.taobao.com.danuoyi.tbcache.com.
www.taobao.com.danuoyi.tbcache.com. 60 IN A 117.187.7.190
www.taobao.com.danuoyi.tbcache.com. 60 IN A 117.187.7.189

;; Query time: 32 msec
;; SERVER: 127.0.0.1#1153(127.0.0.1) (UDP)
;; WHEN: Tue May 07 10:04:13 CEST 2024
;; MSG SIZE  rcvd: 216
liang-hiwin commented 4 months ago

Can you show me your config as well? Or at least the ECS part of it.

When I test with a plain config I'm getting the expected results.

$ dig a +subnet=211.139.5.0/24 @127.0.0.1 -p 1153 www.taobao.com

; <<>> DiG 9.18.24 <<>> a +subnet @127.0.0.1 -p 1153 www.taobao.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20275
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; CLIENT-SUBNET: 211.139.5.0/24/24
;; QUESTION SECTION:
;www.taobao.com.          IN  A

;; ANSWER SECTION:
www.taobao.com.       600 IN  CNAME   www.taobao.com.danuoyi.tbcache.com.
www.taobao.com.danuoyi.tbcache.com. 60 IN A   117.187.7.190
www.taobao.com.danuoyi.tbcache.com. 60 IN A   117.187.7.189

;; Query time: 32 msec
;; SERVER: 127.0.0.1#1153(127.0.0.1) (UDP)
;; WHEN: Tue May 07 10:04:13 CEST 2024
;; MSG SIZE  rcvd: 216

here

#### BOOTSTRAP ####
[bootstrap-resolver]
protocol = "doh"
address = "https://223.5.5.5:443/dns-query"
#bootstrap-address = "223.5.5.5"

#### LISTENERS ####
[listeners.dns-udp]
address = ":5500"
protocol = "udp"
resolver = "ecs"

[listeners.dns-tcp]
address = ":5500"
protocol = "tcp"
resolver = "ecs"

# ECS
[groups.ecs]
type = "ecs-modifier"
resolvers = ["rrl"]
ecs-op = "add"
ecs-prefix4 = 24
ecs-prefix6 = 64

[groups.rrl]
type = "rate-limiter"
resolvers = ["cache"]
limit-resolver = "static-refused"
requests = 360
window = 60
prefix4 = 24
prefix6 = 64

[groups.static-refused]
type = "static-responder"
#rcode = 5                                                           # REFUSED
edns0-ede = { code = 15, text = "The number of requests has exceeded 360 per minute and will be released automatically after 1 minute." } # Valid codes defined in https://datatracker.ietf.org/doc/html/rfc8914

# Cache
[groups.cache]
type = "cache"
resolvers = ["ttl"]
cache-size = 4096
cache-negative-ttl = 60
cache-prefetch-trigger = 10
cache-prefetch-eligible = 20
gc-period = 60
cache-answer-shuffle = "round-robin"
cache-harden-below-nxdomain = true
cache-flush-query = "flush.cache."
#backend = { type = "memory", size = 100000, filename = "/opt/routedns/cache.json", save-interval = 60 }
backend = { type = "redis", redis-address = "10.10.10.12:6379", redis-db = 0, redis-key-prefix = "routedns-" }

# TTL
[groups.ttl]
type = "ttl-modifier"
resolvers = ["concurrent-dns"]
ttl-min = 10                   # 10 s
ttl-max = 180                  # 3 Minutes

# Block IP-DATA
[groups.blocklist-ip]
type = "response-blocklist-ip"
resolvers = ["concurrent-dns"]
filter = true
blocklist-refresh = 600
blocklist-source = [
    { format = "cidr", source = "/opt/mosdns/rules/china_ip_list.txt" },
    { format = "cidr", source = "/opt/routedns/white-ip.txt" },
]

# Internet
[resolvers.internet-udp]
protocol = "udp"
address = "127.0.0.1:35354"

[resolvers.internet-tcp]
protocol = "tcp"
address = "127.0.0.1:35354"

[groups.concurrent-dns]
resolvers = ["internet-udp", "internet-tcp"]
type = "round-robin"

# [groups.dns-logged]
# type = "syslog"
# resolvers = ["concurrent-dns"]
# network = "tcp"
# address = "127.0.0.1:514"
# priority = "debug"
# tag = "routedns"
# log-request = true
# log-response = true
liang-hiwin commented 4 months ago

5500 is the port of rouredns, of which 35354 is tested normally. It is another dns program.

image

folbricht commented 4 months ago

I think I understand now. So you want to add only if there isn't an ECS option in the query from the client already. If so, can you try out the issue-383 branch and modify your config to this?

[groups.ecs]
type = "ecs-modifier"
resolvers = ["rrl"]
ecs-op = "add-if-missing"
ecs-prefix4 = 24
ecs-prefix6 = 64
liang-hiwin commented 4 months ago

I think I understand now. So you want to add only if there isn't an ECS option in the query from the client already. If so, can you try out the issue-383 branch and modify your config to this?

[groups.ecs]
type = "ecs-modifier"
resolvers = ["rrl"]
ecs-op = "add-if-missing"
ecs-prefix4 = 24
ecs-prefix6 = 64

Thank you, wait a moment

liang-hiwin commented 4 months ago

I think I understand now. So you want to add only if there isn't an ECS option in the query from the client already. If so, can you try out the issue-383 branch and modify your config to this?

[groups.ecs]
type = "ecs-modifier"
resolvers = ["rrl"]
ecs-op = "add-if-missing"
ecs-prefix4 = 24
ecs-prefix6 = 64

There is still a problem when executing dig a +subnet=211.139.5.0/24 @127.0.0.1 -p 5500 www.taobao.com Got 117.187.7.190 and 117.187.7.189. But I changed another ecs request and got the same result, such as dig a +subnet=218.203.160.0/24 @127.0.0.1 -p 5500 www.taobao.com, you will still get 117.187.7.190 and 117.187.7.189, normally it should be 111.51.140.179 and 111.51.140.180

liang-hiwin commented 4 months ago

The value of CLIENT-SUBNET is randomly lost when requesting

image

folbricht commented 4 months ago

Not sure what happened in your case there. It looks like your dig didn't actually send the ECS option? If I run it here I get the IPs you mentioned.

$ dig a +subnet=218.203.160.0/24 @127.0.0.1 -p 5500 www.taobao.com

; <<>> DiG 9.18.24 <<>> a +subnet @127.0.0.1 -p 5500 www.taobao.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64245
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; CLIENT-SUBNET: 218.203.160.0/24/24
;; QUESTION SECTION:
;www.taobao.com.            IN  A

;; ANSWER SECTION:
www.taobao.com.     147 IN  CNAME   www.taobao.com.danuoyi.tbcache.com.
www.taobao.com.danuoyi.tbcache.com. 27 IN A 111.51.140.180
www.taobao.com.danuoyi.tbcache.com. 27 IN A 111.51.140.179

;; Query time: 1 msec
;; SERVER: 127.0.0.1#5500(127.0.0.1) (UDP)
;; WHEN: Fri May 10 10:19:27 CEST 2024
;; MSG SIZE  rcvd: 216
liang-hiwin commented 4 months ago

Not sure what happened in your case there. It looks like your dig didn't actually send the ECS option? If I run it here I get the IPs you mentioned.

$ dig a +subnet=218.203.160.0/24 @127.0.0.1 -p 5500 www.taobao.com

; <<>> DiG 9.18.24 <<>> a +subnet @127.0.0.1 -p 5500 www.taobao.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64245
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; CLIENT-SUBNET: 218.203.160.0/24/24
;; QUESTION SECTION:
;www.taobao.com.          IN  A

;; ANSWER SECTION:
www.taobao.com.       147 IN  CNAME   www.taobao.com.danuoyi.tbcache.com.
www.taobao.com.danuoyi.tbcache.com. 27 IN A   111.51.140.180
www.taobao.com.danuoyi.tbcache.com. 27 IN A   111.51.140.179

;; Query time: 1 msec
;; SERVER: 127.0.0.1#5500(127.0.0.1) (UDP)
;; WHEN: Fri May 10 10:19:27 CEST 2024
;; MSG SIZE  rcvd: 216

My upstream test here is normal, but when testing routedns, the value of CLIENT-SUBNET will occasionally be lost.

folbricht commented 4 months ago

Does it only happen sometimes? Can you test without the cache? I wonder if that plays into this issue

liang-hiwin commented 4 months ago

Does it only happen sometimes? Can you test without the cache? I wonder if that plays into this issue

Your thoughts are the same as mine. It is normal for me to test ecs after not using cache.

liang-hiwin commented 4 months ago

I have now changed to another server. It does not have a proxy environment, and the cache test seems to be normal. Could it be that the proxy affects ecs?

liang-hiwin commented 4 months ago

Currently, when testing regular DNS and building an encrypted DNS server, empty resolution occurs. The same situation occurs when configuring content without any filtering.

folbricht commented 4 months ago

Did you have a chance to test this a bit more and maybe come up with a way to reproduce the issue? Everything seems to function normally here.

liang-hiwin commented 3 months ago

Did you have a chance to test this a bit more and maybe come up with a way to reproduce the issue? Everything seems to function normally here.

It is difficult for me to reproduce, and the dig test may also result in empty resolution.

liang-hiwin commented 3 months ago

Did you have a chance to test this a bit more and maybe come up with a way to reproduce the issue? Everything seems to function normally here.

Can you merge into the master branch?

folbricht commented 3 months ago

Merged in https://github.com/folbricht/routedns/pull/389

liang-hiwin commented 1 month ago

Hello, ecs will lose the client's real IP。Through debugging, we can see that ecs=127.0.0.0, which is abnormal. @folbricht

INFO[0000] reading cache file                            filename=/opt/routedns/cache.json
INFO[0000] starting listener                             addr=":6000" id=dns-udp protocol=udp
INFO[0000] starting listener                             addr=":6000" id=dns-tcp protocol=tcp
INFO[0000] starting listener                             addr=":854" id=local-dot protocol=dot
DEBU[0004] received query                                addr=":854" client=111.0.0.1 id=local-dot protocol=dot qname=pull-f5-io.flive.douyincdn.com.
DEBU[0004] querying upstream resolver                    client=127.0.0.1 id=server-udp protocol=udp qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver="127.0.0.1:6000"
DEBU[0004] received query                                addr=":6000" client=127.0.0.1 id=dns-udp protocol=udp qname=pull-f5-io.flive.douyincdn.com.
DEBU[0004] adding ecs option                             client=127.0.0.1 ecs=127.0.0.0 id=ecs mask=24 qname=pull-f5-io.flive.douyincdn.com. qtype=A
liang-hiwin commented 1 month ago

This bug is targeted at encrypted DNS servers. Plain DNS tests are normal.

folbricht commented 1 month ago

This looks as expected though. As per docs

the address of the client is used (with ecs-prefix4 or ecs-prefix6 applied)

So if the address is 127.0.0.1 and the mask is /24 then result will be 127.0.0.0. Were you expecting something else?

liang-hiwin commented 1 month ago

This looks as expected though. As per docs

the address of the client is used (with ecs-prefix4 or ecs-prefix6 applied)

So if the address is 127.0.0.1 and the mask is /24 then result will be 127.0.0.0. Were you expecting something else?

You misunderstood. The client is not 127.0.0.1. The real client should be the public network IP.

liang-hiwin commented 1 month ago

For example, if I use a computer to request my cloud server DNS, the normal cloud server client ecs should be my computer's public network IP, not 127.0.0.1.

folbricht commented 1 month ago

What listener config are you using? In your logs I see id=dns-udp which suggests it's plain UDP

liang-hiwin commented 1 month ago

INFO[0000] reading cache file filename=/opt/routedns/cache.json INFO[0000] starting listener addr=":6000" id=dns-udp protocol=udp INFO[0000] starting listener addr=":6000" id=dns-tcp protocol=tcp INFO[0000] starting listener addr=":854" id=local-dot protocol=dot DEBU[0004] received query addr=":854" client=111.0.0.1 id=local-dot protocol=dot qname=pull-f5-io.flive.douyincdn.com.

Here you can see that I use encrypted TLS port 854, and the requested domain name is pull-f5-io.flive.douyincdn.com, and the attached ECS is 127.0.0.1. The bug mentioned here is that this ecs should be my real IP, which is the public IP of the client I test.

INFO[0000] reading cache file                            filename=/opt/routedns/cache.json
INFO[0000] starting listener                             addr=":6000" id=dns-udp protocol=udp
INFO[0000] starting listener                             addr=":6000" id=dns-tcp protocol=tcp
INFO[0000] starting listener                             addr=":854" id=local-dot protocol=dot
DEBU[0004] received query                                addr=":854" client=127.0.0.1 id=local-dot protocol=dot qname=pull-f5-io.flive.douyincdn.com.
liang-hiwin commented 1 month ago

Sorry, the log I gave before was not detailed enough. Now I will test each listener.

protocol=tcp/udp, If I use a computer and a DNS cloud server for testing, I can get the real IP of the client normally. However, if I use the terminal of the DNS cloud server for testing, I cannot get the real IP of the server.

Test plan 1. If I use a computer and a DNS cloud server for testing, I can get the client's real IP normally. The following is the debug log

DEBU[0231] received query                                addr=":6000" client=110.0.0.1 id=dns-tcp protocol=tcp qname=pull-f5-io.flive.douyincdn.com.
DEBU[0231] adding ecs option                             client=110.0.0.1 ecs=110.0.0.1 id=ecs mask=24 qname=pull-f5-io.flive.douyincdn.com. qtype=A
DEBU[0231] forwarding query to resolver                  client=110.0.0.1 id=rrl qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver=cache
DEBU[0231] cache-miss, forwarding                        client=110.0.0.1 id=cache qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver=ttl
DEBU[0231] forwarding query to resolver                  client=110.0.0.1 id=concurrent-dns qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver=internet-tcp
DEBU[0231] querying upstream resolver                    client=110.0.0.1 id=internet-tcp protocol=tcp qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver="127.0.0.1:35354"
DEBU[0231] modified response ttl                         client=110.0.0.1 id=ttl qname=pull-f5-io.flive.douyincdn.com. qtype=A

image

Test plan 2. If I use the terminal of the DNS cloud server to use dig to test and get the real IP of the client, the following is the debug log

DEBU[0403] received query                                addr=":6000" client=127.0.0.1 id=dns-udp protocol=udp qname=www.pixiv.com.
DEBU[0403] adding ecs option                             client=127.0.0.1 ecs=127.0.0.0 id=ecs mask=24 qname=www.pixiv.com. qtype=A
DEBU[0403] forwarding query to resolver                  client=127.0.0.1 id=rrl qname=www.pixiv.com. qtype=A resolver=cache
DEBU[0403] cache-miss, forwarding                        client=127.0.0.1 id=cache qname=www.pixiv.com. qtype=A resolver=ttl
DEBU[0403] forwarding query to resolver                  client=127.0.0.1 id=concurrent-dns qname=www.pixiv.com. qtype=A resolver=internet-udp
DEBU[0403] querying upstream resolver                    client=127.0.0.1 id=internet-udp protocol=udp qname=www.pixiv.com. qtype=A resolver="127.0.0.1:35354"

image

liang-hiwin commented 1 month ago

protocol=quic, Test solution 3. If I use a PC and a DNS cloud server to test, I get the real IP of my PC. Here is the debug log

DEBU[1449] received query                                addr=":8600" client="110.0.0.1:58302" id=local-doq protocol=doq qname=pull-f5-io.flive.douyincdn.com.
DEBU[1449] querying upstream resolver                    client=110.0.0.1 id=server-udp protocol=udp qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver="127.0.0.1:6000"
DEBU[1449] received query                                addr=":6000" client=127.0.0.1 id=dns-udp protocol=udp qname=pull-f5-io.flive.douyincdn.com.
DEBU[1449] adding ecs option                             client=127.0.0.1 ecs=127.0.0.0 id=ecs mask=24 qname=pull-f5-io.flive.douyincdn.com. qtype=A
DEBU[1449] forwarding query to resolver                  client=127.0.0.1 id=rrl qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver=cache
DEBU[1449] cache-miss, forwarding                        client=127.0.0.1 id=cache qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver=ttl
DEBU[1449] forwarding query to resolver                  client=127.0.0.1 id=concurrent-dns qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver=internet-tcp
DEBU[1449] querying upstream resolver                    client=127.0.0.1 id=internet-tcp protocol=tcp qname=pull-f5-io.flive.douyincdn.com. qtype=A resolver="127.0.0.1:35354"
DEBU[1449] modified response ttl                         client=127.0.0.1 id=ttl qname=pull-f5-io.flive.douyincdn.com. qtype=A
liang-hiwin commented 1 month ago

protocol=dot, cannot be tested, dot server does not work properly https://github.com/folbricht/routedns/issues/397

folbricht commented 1 month ago

Which of the 3 tests above do you feel shows the issue? It looks all correct so far. There are two situations where you would get 127.0.0.1 as client IP:

  1. You have some kind of reverse proxy (nginx or similar) in front of your server.
  2. You're initiating the query on the server itself. This looks like scenario 2 in your tests. To get the client IP, routedns looks at the source address of the incoming connection. So if you're on the server, the source IP would be 127.0.0.1 and the same would happen if there's a proxy that opens a new connection on the server.
liang-hiwin commented 1 month ago

Which of the 3 tests above do you feel shows the issue? It looks all correct so far. There are two situations where you would get 127.0.0.1 as client IP:

  1. You have some kind of reverse proxy (nginx or similar) in front of your server.
  2. You're initiating the query on the server itself. This looks like scenario 2 in your tests. To get the client IP, routedns looks at the source address of the incoming connection. So if you're on the server, the source IP would be 127.0.0.1 and the same would happen if there's a proxy that opens a new connection on the server.

I used a computer to test why the DNS server identified the client as 127.0.0.1 and I was using quic so I ruled out nginx issues.