OpenVPN / openvpn

OpenVPN is an open source VPN daemon
http://openvpn.net
Other
10.26k stars 2.92k forks source link

write to TUN/TAP : Invalid argument (fd=-1,code=22) #461

Open Glaeken opened 7 months ago

Glaeken commented 7 months ago

Linux, kernel 6.2 (but the error is OLD, for sure starting from 5.x kernels and opevnpn 2.5) Multiple linux clients. TUN mode, UDP. Client. Openvpn version whatever, every. Openvpn throws this error every time it receives a MULTICAST packet from another client:

ovpn-client[2175382]: write to TUN/TAP : Invalid argument (fd=-1,code=22)

Example packet from strace (internal packet after decryption, being sent to TUN interface): write(6, "\xfb\x00\x00\x43\x50\xc0\x00\x00\x01\x11\x7a\xee\x--\x--\x--\x--\xe0\x00\x00\xfb\x14\xe9\x14\xe9\x00\x2f\x14\x14\x00\x00\x00\x00"..., 67) = -1 EINVAL (Invalid argument)

OR:

write(6, "\xfb\x00\x00\x3d\x5b\x4f\x00\x00\x01\x11\x70\x64\x--\x--\x--\x--\xe0\x00\x00\xfc\xd6\x57\x14\xeb\x00\x29\x06\xb3\x22\x37\x00\x00"..., 61) = -1 EINVAL (Invalid argument)

Those are UDP multicast packets sent to addresses like 224.0.0.251 or similar. Packets never reach the TUN interface so those are not visible in any network analysis. Removing/adding MULTICAST option on the tun interface does not change anything. No iptables involved. Openvpn logs reach enormous sizes.

IMHO Openvpn should not route multicast packets by default. (Those above came from win10 with installed openvpn)

\x--\x--\x--\x-- - internal vpn IP removed from packet data for security reasons.

Ask if any more info is needed.

[edit] Current workaround is to remove client-to-client directive on the server side, allowing packets to go through TUN interface instead of internal openvpn mechanism.

cron2 commented 7 months ago

Client side OpenVPN is basically a dumb pipe - whatever the server sends, OpenVPN forwards to the tun/tap interface and vice versa - there is no routing and filtering and interpretation of packets done in the client. Unicast, multicast, broadcast, all is just piped through.

So there's two issues here

Glaeken commented 7 months ago
  • multicast packets need to be stopped on the server side ("find out who is sending them and make them stop / iptable them") for a short term "make these errors go away" fix

Those packets are client-to-client so they don't go through iptables. Unable to filter. Those were sent form multiple win10 systems.

  • find out why your client OS does not like the multicast packets - from the log it's a bit unclear whether this is "Linux" or "Windows" on the client? On Windows, you could try using win-dco, where we have more control on what is happening inside.

As I stated: "Linux, kernel 6.2 (but the error is OLD, for sure starting from 5.x kernels and opevnpn 2.5) Multiple linux clients."

So at least 10 linux clients, with kernels between 5.x and 6.2 Also, I noticed two android phones doing the same, throwing this error (implementation: "openvpn for android").

Maybe the question should be, why Win10 with openvpn sends multicast traffic to openvpn interface at all?

There is also another downside. Devices like phones/tablets, constantly receive traffic, cause this multicast is being sent as a broadcast(why? the phones are not in this multicast group). This drains the battery.

Glaeken commented 7 months ago

Android LOG example: session-attachment-2023-11-24-191509

Linux logs: ovpn-client[2175382]: write to TUN/TAP : Invalid argument (fd=-1,code=22) ovpn-client[2175382]: write to TUN/TAP : Invalid argument (fd=-1,code=22) ovpn-client[2175382]: write to TUN/TAP : Invalid argument (fd=-1,code=22) ovpn-client[2175382]: write to TUN/TAP : Invalid argument (fd=-1,code=22) ovpn-client[2175382]: write to TUN/TAP : Invalid argument (fd=-1,code=22) ...60000 more messages

cron2 commented 7 months ago

Those packets are client-to-client so they don't go through iptables. Unable to filter. Those were sent form multiple win10 systems.

If you remove client-to-client from the server config, client-to-client unicast(!) goes via server side tun. Not sure right now if this applies to client-to-client multicast too.

  • find out why your client OS does not like the multicast packets - from the log it's a bit unclear whether this is "Linux" or "Windows" on the client? On Windows, you could try using win-dco, where we have more control on what is happening inside.

As I stated: "Linux, kernel 6.2 (but the error is OLD, for sure starting from 5.x kernels and opevnpn 2.5) Multiple linux clients."

This was not so obvious to me, now it is. Thanks.

Maybe the question should be, why Win10 with openvpn sends multicast traffic to openvpn interface at all?

This is a valid question. My Win10 machines don't do this, so I've never experienced the problem (nor has it been reported before).

There is also another downside. Devices like phones/tablets, constantly receive traffic, cause this multicast is being sent as a broadcast(why? the phones are not in this multicast group). This drains the battery.

OpenVPN has no multicast handling beyond "yes, this is not unicast". So there are no IGMP joins etc, and multicast is treated as broadcast.

"Real" multicast routing - at least with join handling - has been requested here and there in the past, but nobody was interested enough to actually produce code to improve things. I used to do WAN multicast, but have stopped, so I wasn't too interested either.

cron2 commented 7 months ago

@ordex any idea what is happening inside Linux tun here?

Glaeken commented 7 months ago

If you remove client-to-client from the server config, client-to-client unicast(!) goes via server side tun. Not sure right now if this applies to client-to-client multicast too.

I wonder how much it will increase the load of cpus on the server side, but i will give it a try.

@ordex any idea what is happening inside Linux tun here?

Not yet, I have to disassemble the incoming packet completly, and somehow simulate it as being written to the tun interface, but I have no Idea what next, cause I will only receive "Invalid argument". Maybe fiddling with the packet type or content would say more.

Glaeken commented 7 months ago

I can CONFIRM, that current workaround is to remove the client-to-client directive from the config file on the server side. Current impact on the cpu is unknown, as this is a single instance testing phase.

[edit] This does not solve the problem with multicast packets arriving at the server itself (but being blocked), which is not a big deal, but again - it has no function (wasted bandwidth).

cron2 commented 7 months ago

If you want the multicast packets to not go to the server, find out what it is sending them, and turn it off. Do you have a wireshark capture on the client? Might help in tracing it back to its origin.

OpenVPN client is a dumb pipe, so if it hits the tun/tap interface, it will hit the server - and this is not trivially changed.

cron2 commented 7 months ago

@ordex any idea what is happening inside Linux tun here?

Not yet, I have to disassemble the incoming packet completly, and somehow simulate it as being written to the tun interface, but I have no Idea what next, cause I will only receive "Invalid argument". Maybe fiddling with the packet type or content would say more.

This was addressed to @ordex, who wrote the linux DCO driver ("kernel side OpenVPN"), so hopefully he has much deeper insights into "what is kernel land doing there, and why is it refusing the packet?"

I have, at times, caused this error myself, when sending invalid packets (checksum errors, etc) to the tun interface - so the kernel returning EINVAL if it doesn't like the packet makes sense, but for multicast, I wonder why it is so. Tun interfaces are as dumb as OpenVPN clients, so "packet should just hit the routing table, and then be dropped if there is no multicast routing" (or a link-local 224.0.0.x anyway). I do know that people use OSPF using two p2p OpenVPN instances across tun, so "sometimes it works", or so...

ordex commented 7 months ago

@Glaeken do you have a saner dump of the multicast packet being written to tun? I see you posted write(6, "\xfb\x00\x00\x43\x50\xc0\x00\x00\x01\x11\x7a\xee\x--\x--\x--\x--\xe0\x00\x00\xfb\x14\xe9\x14\xe9\x00\x2f\x14\x14\x00\x00\x00\x00"..., 67), but either I am missing something or this packet is starting with 0xFB, which doesn't seem right. Anybody wants to correct me?

ordex commented 7 months ago

@Glaeken I tested this scenario locally and I was able to happily deliver a multicast packet from one client to the other via client-to-client without any error.

It truly seems that there is something else in your setup that is mangling that packet, otherwise I can't explain why an IP packet would start with 0xFB.

Can you please share your config file? This way we can see what options are at play in your environment.

Glaeken commented 7 months ago

@Glaeken do you have a saner dump of the multicast packet being written to tun? I see you posted write(6, "\xfb\x00\x00\x43\x50\xc0\x00\x00\x01\x11\x7a\xee\x--\x--\x--\x--\xe0\x00\x00\xfb\x14\xe9\x14\xe9\x00\x2f\x14\x14\x00\x00\x00\x00"..., 67), but either I am missing something or this packet is starting with 0xFB, which doesn't seem right. Anybody wants to correct me?

It seems you are right, when I dump the packet at the server TUN interface, it looks like this: 4500 0043 6d18 0000 0111 5eda ---- ---- e000 00fb 14e9 14e9 002f 3eaf 0000 0000 0001 0000 0000 0000 0f42 5257 4334 3845 3846 3937 4146 4144 056c 6f63 616c 0000 0100 01

But at the receiving client side, it actually starts 0xFB instead of 0x45

My config is pretty ordinary, based on certs, except: multihome mssfix xxxx tls-crypt /xxxxx/xxx.xx 0 compress lz4 comp-noadapt sndbuf 393216 rcvbuf 393216 push "sndbuf 393216" push "rcvbuf 393216" fast-io

Some values removed for security reasons, I can send those or a full config in a PM

[edit] ps. this packet in this replay is a different but similar multicast (mdns here) packet, not the exact one which I reported at the beginning

ordex commented 7 months ago

our wild guess is probably on the compression doing something wrong.

Question: is "at the server TUN interface" the entry point/sender side? or is the server also receiving the packet via VPN from another client?

ordex commented 7 months ago

@Glaeken can you confirm if both compression settings appear on all clients configs?

ordex commented 7 months ago

I am seeing the issue here as well. I am not yet sure which packet is triggering that, but the error shows up every now and then and when it does I also have a packet starting with 0xFB. It seems the result of the decompression, but I am not yet sure why that happens only with this packet

Glaeken commented 7 months ago

@Glaeken can you confirm if both compression settings appear on all clients configs?

Yes, each client has: compress lz4 comp-noadapt Also "allow-compression" is NOT SET on the client, so the sending client does not compress the packet. (compression for receiving only)

Question: is "at the server TUN interface" the entry point/sender side? or is the server also receiving the packet via VPN from another client?

Packet is from another client: packet route: client1 -> TUN (or openvpn process when client-to-client is enabled) -> client2

[edit] It turns out, some clients CAN HAVE the allow-compression set to yes, so I have to say here: unsure

schwabe commented 7 months ago

Even without allowing compression you still enable compression framing which will add an extra header. Does the server config also have a compress option in it?

Glaeken commented 7 months ago

Even without allowing compression you still enable compression framing which will add an extra header. Does the server config also have a compress option in it?

exactly every config, including server, has "compress lz4" and "comp-noadapt" in it