erebe / wstunnel

Tunnel all your traffic over Websocket or HTTP2 - Bypass firewalls/DPI - Static binary available
BSD 3-Clause "New" or "Revised" License
4.32k stars 370 forks source link

SO_MARK not applied to DNS requests to resolve remote WS server #295

Closed r-vdp closed 4 months ago

r-vdp commented 5 months ago

Describe the bug When specifying an so_mark to mark outbound packets with, this mark does not get applied to the DNS request that's made to lookup the address of the remote WS server. Because of this, the DNS request never succeeds when the mark is used for policy routing to avoid loops in a VPN setup.

To Reproduce Steps to reproduce the behavior:

  1. Run wstunnel with the so_mark option
  2. Configure policy routing to route marked packets on to one interface and non-marked onto another one
  3. Observe that the packets from the initial DNS request get routed to the wrong interface

Expected behavior The so_mark should be applied to the initial DNS request as well.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

erebe commented 5 months ago

Hello,

Would you mind trying this pre-release and let me know if it fixes your issue https://github.com/erebe/wstunnel/releases/tag/v9.7.1-rc1

r-vdp commented 5 months ago

Great, thanks a lot for implementing this so fast! I will test it and I'll let you know!

r-vdp commented 5 months ago

This seems to work correctly now for UDP-based DNS, I didn't test DoH or DoT yet.

From the diff I see only changes related to UDP sockets, does this mean that there might be more changes needed for DoH/DoT?

erebe commented 5 months ago

Perfect then:)

It should work also for DoT/DoT, i made the change in the badly named commit lint https://github.com/erebe/wstunnel/commit/e44b8fa124517ad3e7dc5cd8009dfd2a39f0a384

(the rc1 release contrains the above commit)

r-vdp commented 5 months ago

Ah, perfect then, thanks a lot!

r-vdp commented 5 months ago

I just gave both DoH and DoT a try, but neither seems to be able to successfully resolve addresses. I passed --dns-resolver=dns+https://9.9.9.9 and --dns-resolver=dns+tls://9.9.9.9 respectively (this is the quad9 DNS resolver), similar to the examples in the manpage.

This is what I get with DoT (I redacted the server's domain name):

jun 15 14:59:43 framework systemd[1]: Started wstunnel client - wg-tunnel-armless.
jun 15 14:59:43 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:43.740104Z TRACE hickory_resolver::async_resolver: handle passed back
jun 15 14:59:43 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:43.740127Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51820 with cnx timeout of 0s
jun 15 14:59:43 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:43.740142Z TRACE mio::poll: registering event source with poller: token=Token(94470805100288), interests=READABLE | WRITABLE
jun 15 14:59:43 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:43.740154Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51821 with cnx timeout of 0s
jun 15 14:59:43 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:43.740161Z TRACE mio::poll: registering event source with poller: token=Token(94470804665472), interests=READABLE | WRITABLE
jun 15 14:59:44 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:44.311352Z  INFO wstunnel::udp: New UDP connection from [::1]:37783
jun 15 14:59:44 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:44.311442Z  INFO cnx_server: wstunnel::tcp: Opening TCP connection to XXXX.dev:443
jun 15 14:59:44 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:44.311486Z DEBUG cnx_server: hickory_proto::xfer::dns_handle: querying: XXXX.dev A
jun 15 14:59:44 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:44.311500Z DEBUG cnx_server: hickory_resolver::name_server::name_server_pool: sending request: [Query { name: Name("XXXX.dev"), query_type: A, query_class: IN }]
jun 15 14:59:44 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:44.311511Z DEBUG cnx_server: hickory_resolver::name_server::name_server_pool: error from UDP, retrying over TCP: No connections available
jun 15 14:59:44 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:44.311524Z DEBUG cnx_server: hickory_resolver::name_server::name_server: reconnecting: NameServerConfig { socket_addr: 9.9.9.9:853, protocol: Tls, tls_dns_name: None, trust_negative_responses: true, tls_config: None, bind_addr: None }
jun 15 14:59:44 framework wstunnel-client-wg-tunnel-armless-start[643422]: 2024-06-15T11:59:44.311550Z DEBUG cnx_server: hickory_proto::xfer::dns_exchange: stream errored while connecting error=io error: bad dns_name

and the errors keep on repeating.

With DoH, I only get this:

jun 15 15:15:21 framework systemd[1]: Started wstunnel client - wg-tunnel-armless.
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.051942Z TRACE hickory_resolver::async_resolver: handle passed back
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.051967Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51820 with cnx timeout of 0s
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.051983Z TRACE mio::poll: registering event source with poller: token=Token(94268524139264), interests=READABLE | WRITABLE
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.051995Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51821 with cnx timeout of 0s
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.052002Z TRACE mio::poll: registering event source with poller: token=Token(94268524097280), interests=READABLE | WRITABLE
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.102489Z  INFO wstunnel::udp: New UDP connection from [::1]:37783
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.102591Z  INFO cnx_server: wstunnel::tcp: Opening TCP connection to XXXX.dev:443
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.102649Z DEBUG cnx_server: hickory_proto::xfer::dns_handle: querying: XXXX.dev A
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.102667Z DEBUG cnx_server: hickory_resolver::name_server::name_server_pool: sending request: [Query { name: Name("XXXX.dev"), query_type: A, query_class: IN }]
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.102681Z DEBUG cnx_server: hickory_resolver::name_server::name_server_pool: error from UDP, retrying over TCP: No connections available
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.102699Z DEBUG cnx_server: hickory_resolver::name_server::name_server: reconnecting: NameServerConfig { socket_addr: 9.9.9.9:443, protocol: Https, tls_dns_name: None, trust_negative_responses: true, tls_config: None, bind_addr: None }
jun 15 15:15:22 framework wstunnel-client-wg-tunnel-armless-start[652167]: 2024-06-15T12:15:22.102788Z TRACE cnx_server: mio::poll: registering event source with poller: token=Token(140124516046592), interests=READABLE | WRITABLE

and then nothing anymore.

erebe commented 5 months ago

ha indeed, i just tried and DoT/DoH are broken. it seems an update of the lib silently broke it.

I am going to fix it, hope fully before end of next week, as i don't have a lot of spare time this week.

erebe commented 5 months ago

Hi back,

I have updated the v9.7.1-rc1 release which should fix DoT/DoH. The only thinf that change is that you need to specify the sni in the resolver url. --dns-resolver=dns+tls://9.9.9.9?sni=dns.quad9.net

Let me know if it works for you

r-vdp commented 5 months ago

Hi back,

I have updated the v9.7.1-rc1 release which should fix DoT/DoH. The only thinf that change is that you need to specify the sni in the resolver url. --dns-resolver=dns+tls://9.9.9.9?sni=dns.quad9.net

Let me know if it works for you

That makes sense! I was already thinking that it's strange that we don't need to specify the sni since without it, we can't verify the certificate.

We could consider using the same syntax as systemd-resolved does, which uses # as a separator, see here: https://www.freedesktop.org/software/systemd/man/latest/resolved.conf.html#DNS=

I will try this over the weekend and get back to you!

erebe commented 5 months ago

Yeah not sure how it was supposed to be working before, if ever... If it works for you too, I am going to make a new release.

Regarding using #, I am not a big fan as in url it represent fragment and there can only be one. If later we want to add other options (i.e: timeout) it will not be possible to add. While with query params it is extensible.

r-vdp commented 5 months ago

Unfortunately, it seems to still not work, below the output for DoH, and I get the same result with DoT.

The logs also suggest that the first trial is over UDP still, which looks a bit suspicious. But I don't see any of the debug output that I otherwise see when I use plain UDP.

jun 20 17:11:27 framework systemd[1]: Starting wstunnel client - wg-tunnel-armless...
jun 20 17:11:27 framework systemd[1]: Started wstunnel client - wg-tunnel-armless.
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.677416Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51820 with cnx timeout of 10s
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.677513Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51821 with cnx timeout of 30s
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.847099Z  INFO wstunnel::udp: New UDP connection from [::1]:37073
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.847244Z  INFO wstunnel::tcp: Opening TCP connection to armless.XXX:443
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.847331Z DEBUG hickory_proto::xfer::dns_handle: querying: armless.XXX A
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.847363Z DEBUG hickory_resolver::name_server::name_server_pool: sending request: [Query { name: Name("armless.XXX"), query_type: A, query_class: IN }]
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.847383Z DEBUG hickory_resolver::name_server::name_server_pool: error from UDP, retrying over TCP: No connections available
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.847402Z DEBUG hickory_resolver::name_server::name_server: reconnecting: NameServerConfig { socket_addr:9.9.9.11:443, protocol: Https, tls_dns_name: Some("dns11.quad9.net"), trust_negative_responses: true, tls_config: None, bind_addr: None }
jun 20 17:11:27 framework wstunnel-client-wg-tunnel-armless-start[1202905]: 2024-06-20T14:11:27.850276Z DEBUG rustls::anchors: add_parsable_certificates processed 147 valid and 0 invalid certs
erebe commented 4 months ago

Are you sure you re-downloaded the release ? With your example everything is working fine for me.

2024-06-21T15:19:01.418794Z  INFO wstunnel::tcp: Opening TCP connection to ws.erebe.eu:443
2024-06-21T15:19:01.418922Z DEBUG hickory_proto::xfer::dns_handle: querying: ws.erebe.eu A
2024-06-21T15:19:01.418964Z DEBUG hickory_resolver::name_server::name_server_pool: sending request: [Query { name: Name("ws.erebe.eu"), query_type: A, query_class: IN }]
2024-06-21T15:19:01.418984Z DEBUG hickory_resolver::name_server::name_server_pool: error from UDP, retrying over TCP: No connections available
2024-06-21T15:19:01.419010Z DEBUG hickory_resolver::name_server::name_server: reconnecting: NameServerConfig { socket_addr: 9.9.9.11:443, protocol: Https, tls_dns_name: Some("dns11.quad9.net"), trust_negative_responses: true, tls_config: None, bind_addr: None }
2024-06-21T15:19:01.421923Z DEBUG rustls::anchors: add_parsable_certificates processed 148 valid and 0 invalid certs
2024-06-21T15:19:01.462581Z DEBUG hickory_proto::h2::h2_client_stream: tcp connection established to: 9.9.9.11:443
2024-06-21T15:19:01.462642Z DEBUG rustls::client::hs: No cached session for DnsName("dns11.quad9.net")
2024-06-21T15:19:01.462783Z DEBUG rustls::client::hs: Not resuming any session
2024-06-21T15:19:01.501288Z DEBUG rustls::client::hs: Using ciphersuite TLS13_AES_256_GCM_SHA384
2024-06-21T15:19:01.501340Z DEBUG rustls::client::tls13: Not resuming
2024-06-21T15:19:01.501524Z DEBUG rustls::client::tls13: TLS1.3 encrypted extensions: [Protocols([ProtocolName(6832)])]
2024-06-21T15:19:01.501543Z DEBUG rustls::client::hs: ALPN protocol is Some(b"h2")
2024-06-21T15:19:01.504557Z DEBUG hickory_proto::h2::h2_client_stream: tls connection established to: 9.9.9.11:443
2024-06-21T15:19:01.504576Z DEBUG h2::client: binding client connection
2024-06-21T15:19:01.504595Z DEBUG h2::client: client connection bound
2024-06-21T15:19:01.504669Z DEBUG hickory_proto::h2::h2_client_stream: h2 connection established to: 9.9.9.11:443
2024-06-21T15:19:01.504702Z DEBUG hickory_proto::xfer: enqueueing message:QUERY:[Query { name: Name("ws.erebe.eu"), query_type: A, query_class: IN }]
2024-06-21T15:19:01.504930Z DEBUG hickory_proto::h2::h2_client_stream: request: Request {
    method: POST,
    uri: https://dns11.quad9.net/dns-query,
    version: HTTP/2.0,
    headers: {
        "content-type": "application/dns-message",
        "accept": "application/dns-message",
        "content-length": "29",
    },
    body: (),
}
2024-06-21T15:19:01.607430Z DEBUG Connection{peer=Client}: h2::proto::settings: received settings ACK; applying Settings { flags: (0x0), enable_push: 0 }
2024-06-21T15:19:01.607566Z DEBUG hickory_proto::h2::h2_client_stream: got response: Response {
    status: 200,
    version: HTTP/2.0,
    headers: {
        "server": "h2o/dnsdist",
        "date": "Fri, 21 Jun 2024 15:19:01 GMT",
        "content-type": "application/dns-message",
        "cache-control": "max-age=10643",
        "content-length": "45",
    },
    body: RecvStream {
        inner: FlowControl {
            inner: OpaqueStreamRef {
                stream_id: StreamId(
                    1,
                ),
                ref_count: 2,
r-vdp commented 4 months ago

Yeah, I just double checked, I rebuilt from the 9.7.1-rc1 tag, which includes your latest commits. I also got different output before.

This is quite strange, I'm not sure what's going on then, I use DoH in firefox and DoT for everything else via systemd-resolved, so I'm pretty sure that both are working fine on the network. I also tested on different networks.

I increased the logging level to TRACE, but I got nothing more useful. Maybe we could include more trace statements in order to figure out what's going on.

erebe commented 4 months ago

Indeed, that's super strange, will try to investigate it a bit more and see if i can add more trace as you suggest

erebe commented 4 months ago

Would you mind pasting all the logs you get using DoH/DoT until the end of the program ? So I can get a better sense regarding where the flow is stopping for you ? Also try to wait for 30seconds before killing wstunnel, if nothing happen when doing the dns resolution.

Also try doing DoH by hand with

curl -H 'accept: application/dns-json' 'https://cloudflare-dns.com/dns-query?name=yourdomain.com&type=A' | jq .
erebe commented 4 months ago

(I made a new release with the related change, as it cannot be worse in any case, and it will allow more people to test/provide feedback)

r-vdp commented 4 months ago

Ok, I have been a bit busy, but I will provide you those logs asap!

r-vdp commented 4 months ago

After two minutes, I didn't get any additional output, this is all I got, with trace enabled:

jun 25 11:34:04 framework systemd[1]: Starting wstunnel client - wg-tunnel-beelink...
jun 25 11:34:04 framework systemd[1]: Started wstunnel client - wg-tunnel-beelink.
jun 25 11:34:04 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:04.431620Z TRACE hickory_resolver::async_resolver: handle passed back
jun 25 11:34:04 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:04.431676Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51820 with cnx timeout of 10s
jun 25 11:34:04 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:04.431702Z TRACE mio::poll: registering event source with poller: token=Token(94442687297280), interests=READABLE | WRITABLE
jun 25 11:34:04 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:04.431726Z  INFO wstunnel::udp: Starting UDP server listening cnx on [::1]:51821 with cnx timeout of 30s
jun 25 11:34:04 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:04.431735Z TRACE mio::poll: registering event source with poller: token=Token(94442687220096), interests=READABLE | WRITABLE
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.071777Z  INFO wstunnel::udp: New UDP connection from [::1]:34797
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.071849Z  INFO cnx_server: wstunnel::tcp: Opening TCP connection to beelink.well-founded.dev:443
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.071893Z DEBUG cnx_server: hickory_proto::xfer::dns_handle: querying: beelink.well-founded.dev A
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.071903Z DEBUG cnx_server: hickory_resolver::name_server::name_server_pool: sending request: [Query { name: Name("beelink.well-founded.dev"), query_type: A, query_class: IN }]
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.071912Z DEBUG cnx_server: hickory_resolver::name_server::name_server_pool: error from UDP, retrying over TCP: No connections available
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.071920Z DEBUG cnx_server: hickory_resolver::name_server::name_server: reconnecting: NameServerConfig { socket_addr: 9.9.9.11:443, protocol: Https, tls_dns_name: Some("dns11.quad9.net"), trust_negative_responses: true, tls_config: None, bind_addr: None }
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.072880Z DEBUG cnx_server: rustls::anchors: add_parsable_certificates processed 147 valid and 0 invalid certs
jun 25 11:34:12 framework wstunnel-client-wg-tunnel-beelink-start[840597]: 2024-06-25T09:34:12.072929Z TRACE cnx_server: mio::poll: registering event source with poller: token=Token(140056870500480), interests=READABLE | WRITABLE
erebe commented 4 months ago

I would say that it is something with your network config. You never reach the tcp connection established to log point, meaning it fails to do the TCP handshake with the DNS server.

Are you using the SO_MARK ? if yes, can you provide your iptables/nftables/ip route configuration ?

P.s: if you use this pre-release you should have a timeout/error after 10sec

https://github.com/erebe/wstunnel/releases/tag/v9.7.2-rc1

r-vdp commented 4 months ago

So, I just tried the new pre-release, and now it seems to work correctly. I'm wondering if the refactoring around the connection setup fixed some edge-case bug with the so_mark that I was running into.

My setup is that I have a wireguard vpn with AllowedIPs = [ "0.0.0.0/0" "::/0" ] and the wireguard endpoint is the local wstunnel instance listening on [::1]. I then have policy routing rules to have all packets routed to a dedicated routing table with only the wireguard default route, except packets with a specific so_mark set, which is the so_mark that I configure wstunnel with, those get routed to the main table that has the routes from whatever physical network I'm connected to.

So it's crucial that all packets emited by wstunnel have the so_mark set, since otherwise they will get routed to the wireguard interface which re-injects them into wstunnel, effectively black-hole-ing them. And I think this is probably what was happening to the DoH and DoT packets before.

I just did a check with 9.7.1 as follows:

  1. Make sure my system resolver is not set to 9.9.9.11
  2. Bring the wireguard interface up
  3. Run tcpdump -s 1500 -i wg -nn -vv dst host 9.9.9.11 to capture any packets destined for the dns resolver sent to the wireguard interface
  4. Start wstunnel

And indeed I get a bunch of requests from wstunnel to 9.9.9.11 that are routed to the wireguard interface, so they must not have the so_mark set.

If I do the same with the latest rc, I see no such requests and wstunnel resolves the domain name, so in that version the so_mark seems to be correctly set on the dns packets and they get routed to the physical interface.

erebe commented 4 months ago

ha fuck, you are right, I am so dumb for not having seen this earlier, sorry for wasting your time :pray:

I was applying the so_mark after the tcp connect, so the tcp handshake was not marked with SO_MARK, thus failing to establish the connection. The refacto to have a timeout on the TCP, use the code that is doing it correctly, so it fixes the mistake ....

I am going to add support for http proxy, if configured, for DoH/T and make a new release.

Sorry about that :x

r-vdp commented 4 months ago

No worries at all, I should have tested this in more detail earlier which would've led us to the actual issue quicker, but I'm happy that it's working now!

Thanks a lot for your efforts on this project! :)

erebe commented 4 months ago

Hopefully this is your last release https://github.com/erebe/wstunnel/releases/tag/v9.7.2 ;)

r-vdp commented 4 months ago

Great, let's close this issue then!