OpenVPN / ovpn-dco

OpenVPN Data Channel Offload in the linux kernel
101 stars 23 forks source link

ovpn-dco: kernel errors about use after free, memleak on Linux 6.1.12 #23

Closed sebidotorg closed 1 year ago

sebidotorg commented 1 year ago

Describe the bug After many dozens of messages about “deleting peer”, mainly due to ”reason 1” and “reason 4”, some due to “reason 2” (said peer being a mobile phone during a train journey), with the last 18 of these messages being 12 times “reason 4”, once “reason 2”, and again five times “reason 4”, the kernel suddenly reported errors about “use after free” and “leaking memory”. dmesg output follows:

[540333.750165] ------------[ cut here ]------------
[540333.750184] refcount_t: underflow; use-after-free.
[540333.750337] WARNING: CPU: 2 PID: 246859 at lib/refcount.c:28 0xffffffff93084052
[540333.750346] Modules linked in: ovpn_dco(O)
[540333.750357] CPU: 2 PID: 246859 Comm: kworker/2:1 Tainted: G           O    T  6.1.12 OpenVPN/openvpn#1 
[540333.750365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1~nc9+4 04/01/2014
[540333.750367] Workqueue: ovpn-event-wq-vpn1 0xffffffffc0604dc0
[540333.750381] RIP: 0010:0xffffffff93084052
[540333.750383] Code: 0b 31 f6 31 ff c3 cc cc cc cc 80 3d ee 00 c2 00 00 0f 85 71 ff ff ff 48 c7 c7 50 68 b5 93 c6 05 da 00 c2 00 01 e8 72 a3 43 00 <0f> 0b 31 f6 31 ff c3 cc cc cc cc 80 3d c1 00 c2 00 00 0f 85 46 ff
[540333.750386] RSP: 0018:ffffa8c9c8abfdb0 EFLAGS: 00010246
[540333.750402] RAX: 0000000000000000 RBX: ffff9007eeb70800 RCX: 0000000000000000
[540333.750421] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[540333.750423] RBP: ffff900753fc07c0 R08: 0000000000000000 R09: 0000000000000000
[540333.750425] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9007eeb70984
[540333.750427] R13: ffff9007fb38ae00 R14: ffff900973d21c80 R15: ffff9007eeb70da8
[540333.750438] FS:  0000000000000000(0000) GS:ffff900973d00000(0000) knlGS:0000000000000000
[540333.750443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[540333.750445] CR2: 00007f4bcaff4dc8 CR3: 000000010bd96003 CR4: 0000000000370ea0
[540333.750447] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[540333.750448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[540333.750450] Call Trace:
[540333.750488]  <TASK>
[540333.750492]  0xffffffffc05fefb7
[540333.750528]  0xffffffffc0604d76
[540333.750530]  0xffffffffc0604ddd
[540333.750532]  0xffffffff92ccbd77
[540333.750575]  0xffffffff92ccbf0f
[540333.750577]  ? 0xffffffff92ccbec0
[540333.750588]  0xffffffff92cd2d6b
[540333.750610]  ? 0xffffffff92cd2cb0
[540333.750611]  0xffffffff92c01d02
[540333.750647]  </TASK>
[540333.750648] ---[ end trace 0000000000000000 ]---
[540333.750665] ------------[ cut here ]------------
[540333.750666] refcount_t: saturated; leaking memory.
[540333.750676] WARNING: CPU: 2 PID: 246859 at lib/refcount.c:19 0xffffffff93084000
[540333.750679] Modules linked in: ovpn_dco(O)
[540333.750682] CPU: 2 PID: 246859 Comm: kworker/2:1 Tainted: G        W  O    T  6.1.12 OpenVPN/openvpn#1 
[540333.750685] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1~nc9+4 04/01/2014
[540333.750687] Workqueue: ovpn-event-wq-vpn1 0xffffffffc0604dc0
[540333.750689] RIP: 0010:0xffffffff93084000
[540333.750690] Code: a3 43 00 0f 0b 31 f6 31 ff c3 cc cc cc cc 80 3d 3f 01 c2 00 00 75 c3 48 c7 c7 f8 67 b5 93 c6 05 2f 01 c2 00 01 e8 c4 a3 43 00 <0f> 0b 31 f6 31 ff c3 cc cc cc cc 80 3d 16 01 c2 00 00 75 9c 48 c7
[540333.750692] RSP: 0018:ffffa8c9c8abfdd8 EFLAGS: 00010246
[540333.750694] RAX: 0000000000000000 RBX: ffff9007eeb70800 RCX: 0000000000000000
[540333.750696] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[540333.750697] RBP: 000000000000006c R08: 0000000000000000 R09: 0000000000000000
[540333.750699] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc8c9bfd26600
[540333.750700] R13: 0000000000000000 R14: ffff900973d21c80 R15: ffff9007eeb70da8
[540333.750702] FS:  0000000000000000(0000) GS:ffff900973d00000(0000) knlGS:0000000000000000
[540333.750715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[540333.750717] CR2: 00007f4bcaff4dc8 CR3: 000000010bd96003 CR4: 0000000000370ea0
[540333.750718] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[540333.750720] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[540333.750721] Call Trace:
[540333.750722]  <TASK>
[540333.750723]  0xffffffffc0604d9b
[540333.750725]  0xffffffffc0604ddd
[540333.750727]  0xffffffff92ccbd77
[540333.750728]  0xffffffff92ccbf0f
[540333.750729]  ? 0xffffffff92ccbec0
[540333.750731]  0xffffffff92cd2d6b
[540333.750732]  ? 0xffffffff92cd2cb0
[540333.750733]  0xffffffff92c01d02
[540333.750735]  </TASK>
[540333.750736] ---[ end trace 0000000000000000 ]---

Version information (please complete the following information):

I hope this helps you find the problem. I can give you access to my kernel config and headers on request.

ordex commented 1 year ago

[I moved the issue to the proper project]

Thanks for the report @sebidotorg ! Too bad that there is no proper stacktrace, however, your description along with the warning message gives us already a rough direction.

Will update the issue as I discover something more concrete!

sebidotorg commented 1 year ago

Too bad that there is no proper stacktrace, however, your description along with the warning message gives us already a rough direction.

If you do not manage to reproduce it, just send me a list of the debugging options I should compile into my kernel, so I can send a better report if and when it happens again! I’ll gladly do that (even things that make the kernel noticeably bigger and slower), since the DCO module helps a lot with my VPN’s bandwidth. Collecting proper debug data is the least I can do.

ordex commented 1 year ago

Hi, sorry for the delay but I Was busy with other major features which had a strict deadline. now we can move on! Options depend on the exact kernel version. Can you have a look at what is suggested here: https://kernelnewbies.org/FAQ/StackTrace ?

However, is this something that was easy to reproduce?

(and please use the latest ovpn-dco master branch and openvpn 2.6.2)

sebidotorg commented 1 year ago

[Cc to @., as the problem is present in the latest Debian packages.]Hi, no worries, I already assumed you were quite busy.I always run the latest kernel available in Debian testing or unstable. I’ll put my current kernel config on https://cloud.sebi.org/s/NdbwfzSeSPo4Twy, in case you’re interested. I got the same errors with kernels 6.1.11, 6.1.12 and 6.1.15, whenever I tried using the VPN from the WiFi in a moving train, after a number of connection losses and reconnects. I did not yet get a chance to see if it also happens with kernel 6.1.20.My client is version 3.3.3 on iPadOS/iOS, as currently available in Apple’s app store. I use the latest Debian package of openvpn on the server, which unfortunately is only at version 2.6.0 from 2023-01-25 in testing and unstable. For ovpn-dco, I use the Debian package openvpn-dco-dkms, which also is from 2023-01-25 (so it is probably outdated, too).Maybe it would be best to first get the current versions of openvpn and ovpn-dco into Debian unstable, in case that already fixes the problem? If it still persists, I would only have to change my kernel config to get proper stacktraces next time. If I have to install all the source dependencies and compile everything from current git sources, it will probably be a while before I can try to trace the errors. I am not a developer, after all, just a server admin who uses binary packages for everything except the kernel (and with no more qualifications than having used Debian for about the past 25 years).Berni, is there a chance you could update the packages?If not, I’ll try to migrate the server from the Debian packages to the latest upstream versions, to get you a proper stacktrace during the next train journey. That could take a few weeks, though.Am 24.03.2023 um 22:28 schrieb Antonio Quartulli @.>:Hi, sorry for the delay but I Was busy with other major features which had a strict deadline. now we can move on! Options depend on the exact kernel version. Can you have a look at what is suggested here: https://kernelnewbies.org/FAQ/StackTrace ? However, is this something that was easy to reproduce? (and please use the latest ovpn-dco master branch and openvpn 2.6.2)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

bernhardschmidt commented 1 year ago

Hi Sebastian,

openvpn 2.6.2 and an updated ovpn-dco have been uploaded to Debian experimental yesterday.

https://packages.debian.org/experimental/openvpn https://packages.debian.org/experimental/openvpn-dco-dkms

Can you try those? I have to apply for a release team exception because we are already in freeze, fixing confirmed bugs would certainly help.

sebidotorg commented 1 year ago

Thank you, Bernhard. I immediately updated to these packages.I got the kernel warnings about use after free and memleak during nearly every ICE train journey in the last two months, right after updating to openvpn/2.6.0-1, openvpn-dco-dkms/0.0+git-20230125-1 and kernel 6.1. (I had to keep using kernel 6.0.12 until you updated the packages to these versions, since the openvpn from the previous package just completely froze on kernel 6.1, as soon as it received a second UDP connection.)In most cases, the warnings only appeared after dozens of reconnection attempts due to Deutsche Bahn’s spotty WiFi service, though. I’ll use the VPN as much as possible at the next opportunity and keep my eyes peeled.Am 27.03.2023 um 09:20 schrieb Bernhard Schmidt @.***>:Hi Sebastian, openvpn 2.6.2 and an updated ovpn-dco have been uploaded to Debian experimental yesterday. https://packages.debian.org/experimental/openvpn https://packages.debian.org/experimental/openvpn-dco-dkms Can you try those? I have to apply for a release team exception because we are already in freeze, fixing confirmed bugs would certainly help.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

ordex commented 1 year ago

Thanks @sebidotorg ! Looking forward to your update!

sebidotorg commented 1 year ago

On 2023-03-28 06:34-0700, Antonio Quartulli wrote:

Thanks @sebidotorg ! Looking forward to your update!

I have now had the error again with kernel 6.1.20 and the versions from Debian testing. It did not yet occur when using the versions from Debian experimental, but it is of course difficult to prove the absence of a phenomenon.

I compiled the kernel with CONFIG_STACKTRACE and CONFIG_UNWINDER_ORC, but it doesn't look that much different from the original error message. Did I chose the wrong unwinder? Here is the latest error, which occured thirteen to fifteen minutes after the last five "reason 4" disconnects:

[1395518.502280] ------------[ cut here ]------------ [1395518.502307] refcount_t: underflow; use-after-free. [1395518.502378] WARNING: CPU: 0 PID: 622045 at lib/refcount.c:28 0xffffffff9ae88072 [1395518.502391] Modules linked in: ovpn_dco(O) ip6_udp_tunnel udp_tunnel [last unloaded: ovpn_dco_v2(O)] [1395518.502420] CPU: 0 PID: 622045 Comm: kworker/0:0 Tainted: G O T 6.1.20 #1 [1395518.502430] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1~nc9+4 04/01/2014 [1395518.502440] Workqueue: ovpn-crypto-wq-vpn1 0xffffffffc0598020 [1395518.502469] RIP: 0010:0xffffffff9ae88072 [1395518.502472] Code: 0b 31 f6 31 ff c3 cc cc cc cc 80 3d 0e c1 c1 00 00 0f 85 71 ff ff ff 48 c7 c7 b0 1f 95 9b c6 05 fa c0 c1 00 01 e8 1e 8b c2 ff <0f> 0b 31 f6 31 ff c3 cc cc cc cc 80 3d e1 c0 c1 00 00 0f 85 46 ff [1395518.502475] RSP: 0018:ffffa25d88a87e90 EFLAGS: 00010246 [1395518.502481] RAX: 0000000000000000 RBX: ffff8e36e1a67080 RCX: 0000000000000000 [1395518.502483] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [1395518.502485] RBP: ffff8e362ab9a900 R08: 0000000000000000 R09: 0000000000000000 [1395518.502486] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc25d7fc11e00 [1395518.502488] R13: 0000000000000020 R14: ffff8e3833c21c00 R15: ffff8e36e1a67088 [1395518.502490] FS: 0000000000000000(0000) GS:ffff8e3833c00000(0000) knlGS:0000000000000000 [1395518.502495] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1395518.502496] CR2: 00005565714e2790 CR3: 000000023e6fa005 CR4: 0000000000370eb0 [1395518.502498] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1395518.502499] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1395518.502501] Call Trace: [1395518.502516] [1395518.502518] 0xffffffff9aacbe77 [1395518.502543] 0xffffffff9aacc00f [1395518.502545] ? 0xffffffff9aacbfc0 [1395518.502546] 0xffffffff9aad2e8b [1395518.502553] ? 0xffffffff9aad2dd0 [1395518.502555] 0xffffffff9aa01cf2 [1395518.502566] [1395518.502567] ---[ end trace 0000000000000000 ]---

cron2 commented 1 year ago

Hi,

On Wed, Apr 05, 2023 at 08:26:54AM -0700, sebidotorg wrote:

On 2023-03-28 06:34-0700, Antonio Quartulli wrote:

Thanks @sebidotorg ! Looking forward to your update!

I have now had the error again with kernel 6.1.20 and the versions from Debian testing. It did not yet occur when using the versions from Debian experimental, but it is of course difficult to prove the absence of a phenomenon.

I think "testing" is still on OpenVPN 2.6.1 and "v1" kernel DCO module, so that is expected to be still broken. "experimental" with v2 DCO is the really interesting one.

gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany @.***

sebidotorg commented 1 year ago

Worse, it is still using 2.6.0. To get 2.6.2 and v2 of the module into the release after the freeze, an exception is needed. If this bug in 2.6.0/v1 could be confirmed, it would help, according to the package maintainer (if it has been fixed in the meantime, that is).I downgraded to the obsolete versions on purpose, to at least once catch the bug with kernel 6.1.20 and CONFIG_STACKTRACE enabled. Afterwards, I immediately switched back to openvpn 2.6.2 and the newer module. However, since the bug only triggers when I am on longer train rides with bad connectivity, it is difficult to conclude its absence from the fact that it so far did not occur with the current versions.Am 05.04.2023 um 18:37 schrieb Gert Doering @.***>: Hi,

On Wed, Apr 05, 2023 at 08:26:54AM -0700, sebidotorg wrote:

On 2023-03-28 06:34-0700, Antonio Quartulli wrote:

Thanks @sebidotorg ! Looking forward to your update!

I have now had the error again with kernel 6.1.20 and the versions from Debian testing. It did not yet occur when using the versions from Debian experimental, but it is of course difficult to prove the absence of a phenomenon.

I think "testing" is still on OpenVPN 2.6.1 and "v1" kernel DCO module, so that is expected to be still broken. "experimental" with v2 DCO is the really interesting one.

gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany @.***

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

bernhardschmidt commented 1 year ago

OpenVPN 2.6.3 and DCOv2 have landed in Testing yesterday and are en-route for the Bookworm release. It looks like this bug has been fixed with DCOv2, so I propose to close it.

sebidotorg commented 1 year ago

Thanks! I have indeed not seen the problem once since switching to the current version. I am happy it made it into Bookworm. I think this bug can be closed, too.Am 10.05.2023 um 00:04 schrieb Bernhard Schmidt @.***>: OpenVPN 2.6.3 and DCOv2 have landed in Testing yesterday and are en-route for the Bookworm release. It looks like this bug has been fixed with DCOv2, so I propose to close it.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

cron2 commented 1 year ago

Happy to hear that this bug has been fixed with the v2 rewrite :-) - so, closing, as instructed!

ordex commented 1 year ago

Thank you very much for your feedback guys!