cloudflare / boringtun

Userspace WireGuard® Implementation in Rust
BSD 3-Clause "New" or "Revised" License
5.93k stars 397 forks source link

Getting few error logs, and my internet stops working for a while. #258

Open netwetstat opened 2 years ago

netwetstat commented 2 years ago

Hi, I hope everyone is good. Using boringtun on ubuntu lease-web server and WireGuard client on IOS. I am getting these error logs quite often. Usually after first handshake I get these logs Decapsulate error, NoCurrentSession While connected to tunnel once browsing gets stop and on checking the boringtun logs usually these logs are there Decapsulate error, InvalidAeadTag Decapsulate error, InvalidCounter Decapsulate error, UnexpectedPacket Decapsulate error, WrongIndex After browsing gets stopped, sometimes I have to restart WireGuard client, or it gets recovered at its own. In case you need more details I will be happy to provide. Thanks :)

Noah-Kennedy commented 2 years ago

@jeff-hiner

jeff-hiner commented 2 years ago

aead or counter errors are consistent with someone tampering with traffic in flight or otherwise corrupting the traffic. I'd check MTU settings first, then make sure the raw data packets are arriving exactly as sent. You can use tools like Wireshark on the client and server end to verify this.

Noah-Kennedy commented 2 years ago

@netwetstat

netwetstat commented 2 years ago

Thanks for replying, I would like to share my configs as well with you At Server, Boringtun start script looks like

boringtun-cli wg0 --disable-drop-privileges -f
ip address add dev wg0 10.0.0.0/8
ip address add dev wg0 2000::/8
wg setconf wg0 /etc/wireguard/wg0.conf
ifconfig wg0 up

Config at server

[Interface]
PrivateKey =<Private Key>
ListenPort = 443
[Peer]
PublicKey = <pk of peer>
AllowedIPs = 10.0.0.1/32,  

Iptables rules at server

interface=`ip route get 1.1.1.1 |  awk '{print $5}'`
iptables -A FORWARD -i $interface -j ACCEPT;
iptables -t nat -A POSTROUTING -o $interface -j MASQUERADE;
ip6tables -A FORWARD -i $interface -j ACCEPT;
ip6tables -t nat -A POSTROUTING -o $interface -j MASQUERADE;

Config at wiregurd ios client

[Interface]
PrivateKey = <Private Key>
Address = 10.0.0.1/32
DNS = 1.1.1.1

[Peer]
PublicKey = <pk of peer>
AllowedIPs = 0.0.0.0/0
Endpoint = <PublicIP and Port>

I am not giving any value of MTU by myself I believe its set to 1400. Do you see anything wrong here? @jeff-hiner

jeff-hiner commented 2 years ago

You should double check the MTU of your capturing tun device and the path MTU of the UDP channel between server and client. If your ethernet/wifi is MTU 1500, then the tun's MTU should be that minus the wireguard overhead, which in worst case is I believe 80 bytes.

Wireguard adds overhead and doesn't do dynamic path MTU discovery, so if the MTU of the tun device is the same or close to the path MTU then packets can get fragmented/dropped causing all sorts of problems. That's about all the advice I can give here.

netwetstat commented 2 years ago

Thanks for all the advice I'll play around as advised. These address Aead-related issues, can you please guide a bit about Decapsulate error, NoCurrentSession as well, I have mentioned in the issue Whenever I start the connection from ios WireGuard client for the very first time I can see these logs on server Decapsulate error, NoCurrentSession I have added a few more logs in boringtun to dig more, at first handshake initiation, boringtun creates a session and sends handshake response to peer. Immediately after that session array gets empty and it starts logging Decapsulate error, NoCurrentSession. After KEEPALIVE_TIMEOUT + REKEY_TIMEOUT as per protocol WireguardClient sends a handshake initiation again, after which a new session is created and the tunnel starts. PS this issue is not persistent i.e not faced every-time.
I am attaching client logs as well here. You can see the logs between 17:41:33.785651: 17:41:49.464690

2022-06-30 17:41:27.240124: [APP] App version: 1.0.15 (26) 2022-06-30 17:41:33.513849: [APP] startActivation: Entering (tunnel: 91327982191430004bor) 2022-06-30 17:41:33.516336: [APP] startActivation: Tunnel is disabled. Re-enabling and saving 2022-06-30 17:41:33.545517: [APP] startActivation: Tunnel saved after re-enabling, invoking startActivation 2022-06-30 17:41:33.545631: [APP] startActivation: Entering (tunnel: 91327982191430004bor) 2022-06-30 17:41:33.545969: [APP] startActivation: Starting tunnel 2022-06-30 17:41:33.546143: [APP] startActivation: Success 2022-06-30 17:41:33.561970: [APP] Tunnel '91327982191430004bor' connection status changed to 'connecting' 2022-06-30 17:41:33.675715: [NET] App version: 1.0.15 (26) 2022-06-30 17:41:33.675809: [NET] Starting tunnel from the app 2022-06-30 17:41:33.780472: [NET] DNS64: mapped 192.241.199.169 to itself. 2022-06-30 17:41:33.781142: [NET] Attaching to interface 2022-06-30 17:41:33.781507: [NET] UAPI: Updating private key 2022-06-30 17:41:33.781729: [NET] Routine: encryption worker 4 - started 2022-06-30 17:41:33.781768: [NET] Routine: decryption worker 2 - started 2022-06-30 17:41:33.781804: [NET] Routine: event worker - started 2022-06-30 17:41:33.781812: [NET] UAPI: Removing all peers 2022-06-30 17:41:33.781863: [NET] Routine: handshake worker 2 - started 2022-06-30 17:41:33.781915: [NET] Routine: encryption worker 3 - started 2022-06-30 17:41:33.781943: [NET] Routine: handshake worker 1 - started 2022-06-30 17:41:33.782028: [NET] Routine: decryption worker 1 - started 2022-06-30 17:41:33.782072: [NET] Routine: decryption worker 3 - started 2022-06-30 17:41:33.782108: [NET] Routine: encryption worker 1 - started 2022-06-30 17:41:33.782128: [NET] Routine: decryption worker 4 - started 2022-06-30 17:41:33.782140: [NET] Routine: encryption worker 2 - started 2022-06-30 17:41:33.782169: [NET] Routine: handshake worker 3 - started 2022-06-30 17:41:33.782228: [NET] Routine: handshake worker 4 - started 2022-06-30 17:41:33.782244: [NET] Routine: encryption worker 6 - started 2022-06-30 17:41:33.782298: [NET] Routine: decryption worker 5 - started 2022-06-30 17:41:33.782401: [NET] Routine: handshake worker 5 - started 2022-06-30 17:41:33.782461: [NET] Routine: decryption worker 6 - started 2022-06-30 17:41:33.782503: [NET] Routine: encryption worker 5 - started 2022-06-30 17:41:33.782516: [NET] Routine: handshake worker 6 - started 2022-06-30 17:41:33.782551: [NET] Routine: TUN reader - started 2022-06-30 17:41:33.782568: [NET] peer(1lgJ…491U) - UAPI: Created 2022-06-30 17:41:33.782711: [NET] peer(1lgJ…491U) - UAPI: Updating endpoint 2022-06-30 17:41:33.782844: [NET] peer(1lgJ…491U) - UAPI: Updating persistent keepalive interval 2022-06-30 17:41:33.782883: [NET] peer(1lgJ…491U) - UAPI: Removing all allowedips 2022-06-30 17:41:33.783004: [NET] peer(1lgJ…491U) - UAPI: Adding allowedip 2022-06-30 17:41:33.783335: [NET] UDP bind has been updated 2022-06-30 17:41:33.783384: [NET] peer(1lgJ…491U) - Starting 2022-06-30 17:41:33.783428: [NET] Routine: receive incoming v6 - started 2022-06-30 17:41:33.783458: [NET] Routine: receive incoming v4 - started 2022-06-30 17:41:33.783537: [NET] Interface state was Down, requested Up, now Up 2022-06-30 17:41:33.783600: [NET] Device started 2022-06-30 17:41:33.783621: [NET] peer(1lgJ…491U) - Routine: sequential receiver - started 2022-06-30 17:41:33.783853: [NET] Tunnel interface is utun4 2022-06-30 17:41:33.784136: [NET] peer(1lgJ…491U) - Routine: sequential sender - started 2022-06-30 17:41:33.784379: [NET] Network change detected with satisfied route and interface order [en0] 2022-06-30 17:41:33.784554: [NET] DNS64: mapped 192.241.199.169 to itself. 2022-06-30 17:41:33.784607: [NET] peer(1lgJ…491U) - UAPI: Updating endpoint 2022-06-30 17:41:33.784865: [NET] Routine: receive incoming v6 - stopped 2022-06-30 17:41:33.784877: [NET] Routine: receive incoming v4 - stopped 2022-06-30 17:41:33.785652: [NET] UDP bind has been updated 2022-06-30 17:41:33.785651: [NET] Routine: receive incoming v4 - started 2022-06-30 17:41:33.785665: [NET] Routine: receive incoming v6 - started 2022-06-30 17:41:33.785714: [APP] Tunnel '91327982191430004bor' connection status changed to 'connected' 2022-06-30 17:41:33.980266: [NET] peer(1lgJ…491U) - Sending handshake initiation 2022-06-30 17:41:34.284779: [NET] Network change detected with satisfied route and interface order [en0, utun4] 2022-06-30 17:41:34.286204: [NET] DNS64: mapped 192.241.199.169 to itself. 2022-06-30 17:41:34.286509: [NET] peer(1lgJ…491U) - UAPI: Updating endpoint 2022-06-30 17:41:34.286912: [NET] Routine: receive incoming v4 - stopped 2022-06-30 17:41:34.287005: [NET] Routine: receive incoming v6 - stopped 2022-06-30 17:41:34.287670: [NET] UDP bind has been updated 2022-06-30 17:41:34.287694: [NET] Routine: receive incoming v4 - started 2022-06-30 17:41:34.287751: [NET] Routine: receive incoming v6 - started 2022-06-30 17:41:34.301457: [NET] peer(1lgJ…491U) - Received handshake response 2022-06-30 17:41:38.547426: [APP] Status update notification timeout for tunnel '91327982191430004bor'. Tunnel status is now 'connected'. 2022-06-30 17:41:49.464690: [NET] peer(1lgJ…491U) - Retrying handshake because we stopped hearing back after 15 seconds 2022-06-30 17:41:49.465055: [NET] peer(1lgJ…491U) - Sending handshake initiation 2022-06-30 17:41:49.764684: [NET] peer(1lgJ…491U) - Received handshake response 2022-06-30 17:41:49.765043: [NET] peer(1lgJ…491U) - Sending keepalive packet 2022-06-30 17:42:31.567019: [NET] peer(1lgJ…491U) - Sending keepalive packet 2022-06-30 17:42:37.688404: [NET] peer(1lgJ…491U) - Receiving keepalive packet 2022-06-30 17:43:00.379811: [NET] peer(1lgJ…491U) - Sending keepalive packet 2022-06-30 17:43:33.327198: [NET] peer(1lgJ…491U) - Sending keepalive packet 2022-06-30 17:43:49.365331: [NET] peer(1lgJ…491U) - Sending keepalive packet 2022-06-30 17:43:51.976065: [NET] peer(1lgJ…491U) - Sending handshake initiation 2022-06-30 17:43:52.297636: [NET] peer(1lgJ…491U) - Received handshake response 2022-06-30 17:43:52.297885: [NET] peer(1lgJ…491U) - Sending keepalive packet

jeff-hiner commented 2 years ago

That decapsulate error sounds like a bug, definitely. When you say "session array gets empty" are you seeing SESSION_EXPIRED(REJECT_AFTER_TIME) on either end immediately after handshake?

netwetstat commented 2 years ago

@jeff-hiner yes you are right, I am seeing this log as well when the session array gets empty, it says

SESSION_EXPIRED(REJECT_AFTER_TIME)

netwetstat commented 2 years ago

Confirmed this SESSION_EXPIRED(REJECT_AFTER_TIME) happens right after handshake server(boringtun)

netwetstat commented 2 years ago

@jeff-hiner can you please guide me further about this?

jeff-hiner commented 2 years ago

There is a timing bug somewhere, but I don't have cycles right now to look into it. Sending a data packet implies a handshake has been done, which means the receiver should have already stored the handshake and its expiry timer properly. That's apparently not happening as expected, probably because of some misuse of synchronization primitives.

jeff-hiner commented 2 years ago

https://github.com/cloudflare/boringtun/pull/286

jeff-hiner commented 1 year ago

Hey, can you see if this issue repros with the latest master branch?

netwetstat commented 1 year ago

It got reproduced with the latest master branch, however, the frequency of repro has become significantly low. I'll share the logs once it gets reproduced

Noah-Kennedy commented 1 year ago

Good to know.