OpenVPN / ovpn-dco

OpenVPN Data Channel Offload in the linux kernel
104 stars 25 forks source link

errors from ovpn_queue_skb and ovpn_udp_encap_recv #4

Closed rahulmb closed 2 years ago

rahulmb commented 2 years ago

dmesg has a lot (>1000) of these 2 messages. What information can I provide to help figure out the issue?

ovpn_queue_skb: cannot queue packet to TX ring ovpn_udp_encap_recv: cannot handle incoming packet: -28

dsommers commented 2 years ago

Which Linux distribution are you on? Which kernel version?

The first log line (ovpn_queue_skb: cannot queue packet to TX ring) is seen a lot on RHEL-8 with the 4.18.0-348.12.2.el8_5 kernel when there is a lot of data traffic. This is a known issue.

The second log line is not something seen that often.

It would help if you can describe your test environment (client and server side, OpenVPN versions/git references, ovpn-dco version/git references, distribution, link speed between client and server, etc, etc) and what kind of traffic you pass over the tunnel. Can you trigger the same behavior if you use tools like iperf? Or another way of generating tunneled traffic in a way which consistently triggers this behavior? That will help

rahulmb commented 2 years ago

Kernel is 5.16.1-1-MANJARO-ARM-ODROID. It's an arm64 single board computer running Manjaro. I'm using the dco linux-client-v15 version tag.

It's a headless NAS. Its serving https, ftp, ssh, and I was downloading the GPT-J model weights from eleutherAI and seeding manjaro images using bittorrent when I noticed it. I was seeing 30mb/s speeds

Things got worse. I set the "openvpn3-admin netcfg-service --config-set systemd-resolved 1" so it'd play nice with resolvd and stop overwriting resolv.conf. Unfortunately the service is now restarting continuously

dsommers commented 2 years ago

Enabling the systemd-resolved integration requires that systemd-resolved is already running and is properly configured on the system. That said, this is a different issue than the initial issue.

In regards to this issue ... it seems like ovpn-dco is limited by some congestion when ovpn-dco tries to add more packets it has processed back to the network stack in the kernel. It's needed to to wait until @ordex is available to dig into this.

huangya90 commented 2 years ago

@rahulmb please adjust link-mtu to a lower value. This issue was dicussed here.

[1]https://sourceforge.net/p/openvpn/mailman/openvpn-devel/thread/CAAUX2SVQRn4zuDC947HE9M38nDNK1oNqyUv0GKeAw2dasHo62Q%40mail.gmail.com/#msg37252118

ordex commented 2 years ago

Hi @rahulmb , thanks a lot for testing out ovpn-dco!

Those messages are not real "errors", but rather messages informing you about what is going on. ovpn_queue_skb: cannot queue packet to TX ring is about being unable to process all the packets being sent across the tunnel. The TX queue is full and new packets are being dropped. Basically traffic is being pumped faster than what ovpn-dco can consume.

ovpn_udp_encap_recv: cannot handle incoming packet: -28 is the counterpart on RX: the queue is full and new received packets cannot be staged for decryption. So traffic is being received too fast and ovpn-dco cannot keep up.

What @huangya90 is suggesting might be an interesting idea. By reducing the MTU you avoid IP fragmentation, thus reduce the number of packets flying around. You could go with tun-mtu 1420.

Please give it a try and let us know.

In any case, these messages are simply there to let you know that there might be some performance drop due to excessive packets to process.

ordex commented 2 years ago

Closing due to inactivity

ogolovanov commented 1 year ago

Hi.

I have the same "problem" ( not really a problem, but ... ). And switching from mtu 1500 to mtu 1400 did not help. Transfer rate via VPN + dco ( client only ) is nearly ~3-4 gbyte/s ( TX + RX )

Still see messages like:

ovpn_queue_skb: cannot queue packet to TX ring
ovpn_udp_encap_recv: 1255 callbacks suppressed
ovpn_udp_encap_recv: cannot handle incoming packet: -28

This is just FYI.