WireGuard / wireguard-vyatta-ubnt

WireGuard for Ubiquiti Devices
https://www.wireguard.com/
GNU General Public License v3.0
1.46k stars 69 forks source link

Unnecessary kvmalloc patch? #10

Closed zx2c4 closed 4 years ago

zx2c4 commented 4 years ago

disable_kmalloc.patch exists because of this old issue thread: https://github.com/Lochnair/vyatta-wireguard/issues/97 , which reported a crash with this stacktrace:

Mar  2 14:33:13 USG3P kernel: CPU 1 Unable to handle kernel paging request at virtual address 0000000000000000, epc == ffffffffc012ced8, ra == ffffffffc0b9314c
Mar  2 14:33:13 USG3P kernel: Oops[#1]:
Mar  2 14:33:13 USG3P kernel: CPU: 1 PID: 4103 Comm: ip Tainted: P           O 3.10.107-UBNT #1
Mar  2 14:33:13 USG3P kernel: task: 800000041c20e0e0 ti: 800000000c030000 task.ti: 800000000c030000
Mar  2 14:33:13 USG3P kernel: $ 0   : 0000000000000000 0000000000000004 ffffffffc0660000 ffffffffc050b3e8
Mar  2 14:33:13 USG3P kernel: $ 4   : 0000000000000001 00000000000012d0 ffffffffc0b9314c 800000000c033670
Mar  2 14:33:13 USG3P kernel: $ 8   : ffffffffffffff9d 800000041d296cc0 ffffffffc050b3e8 000000001a5f4728
Mar  2 14:33:13 USG3P kernel: $12   : 0000000000000008 ffffffffc025c878 ffffffffd76c0898 0000000000000000
Mar  2 14:33:13 USG3P kernel: $16   : 800000041d296000 0000000000000000 00000000000012d0 ffffffffc0531980
Mar  2 14:33:13 USG3P kernel: $20   : 800000041db09e10 800000041d296000 0000000000000000 ffffffffc080a380
Mar  2 14:33:13 USG3P kernel: $24   : 0000000005733924 0000000027f2031c
Mar  2 14:33:13 USG3P kernel: $28   : 800000000c030000 800000000c033710 800000000c033780 ffffffffc0b9314c
Mar  2 14:33:13 USG3P kernel: Hi    : 0000000000000000
Mar  2 14:33:13 USG3P kernel: Lo    : 1dcbc89e99000000
Mar  2 14:33:13 USG3P kernel: epc   : ffffffffc012ced8 kmem_cache_alloc+0x30/0x150
Mar  2 14:33:13 USG3P kernel:    Tainted: P           O
Mar  2 14:33:13 USG3P kernel: ra    : ffffffffc0b9314c wg_pubkey_hashtable_alloc+0x1c/0xd8 [wireguard]
Mar  2 14:33:13 USG3P kernel: Status: 10008ce3  KX SX UX KERNEL EXL IE
Mar  2 14:33:13 USG3P kernel: Cause : 00800008
Mar  2 14:33:13 USG3P kernel: BadVA : 0000000000000000
Mar  2 14:33:13 USG3P kernel: PrId  : 000d0601 (Cavium Octeon+)
Mar  2 14:33:13 USG3P kernel: Modules linked in: wireguard(O) ip_tunnel xt_mark xt_nat 8021q garp stp llc ipt_MASQUERADE xt_set nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_conntrack ip_set_bitmap_port xt_TCPMSS xt_tcpudp ip6table_mangle ip6table_filter ip6table_raw ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_mangle xt_CT iptable_raw nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_nat_proto_gre nf_nat_tftp nf_nat_ftp nf_nat nf_conntrack_tftp nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables ip_set_hash_net ip_set nfnetlink configfs unifigpio(PO) unifihal(PO) cvm_ipsec_kame(O) ipv6 imq cavium_ip_offload(PO) ubnt_nf_app(PO) tdts(PO) octeon_rng rng_core octeon_ethernet mdio_octeon ethernet_mem octeon_common of_mdio ubnt_platform(PO) libphy [last unloaded: nf_conntrack_sip]
Mar  2 14:33:13 USG3P kernel: Process ip (pid: 4103, threadinfo=800000000c030000, task=800000041c20e0e0, tls=0000000077a5b490)
Mar  2 14:33:13 USG3P kernel: Stack : 800000041d296000 800000041d296680 800000000c033780 ffffffffc0b9314c
Mar  2 14:33:13 USG3P kernel:     800000041d296000 ffffffffc0b8d01c 800000041db09e00 800000041db09e00
Mar  2 14:33:13 USG3P kernel:     ffffffffc0531980 800000000c033780 ffffffffc0531980 ffffffffc0346a5c
Mar  2 14:33:13 USG3P kernel:     800000000c033780 ffffffffc0346768 0000000000000000 0000000000000000
Mar  2 14:33:13 USG3P kernel:     0000000000000000 800000041db09e20 0000000000000000 0000000000000000
Mar  2 14:33:13 USG3P kernel:     0000000000000000 0000000000000000 0000000000000000 0000000000000000
Mar  2 14:33:13 USG3P kernel: last message repeated 2 times
Mar  2 14:33:13 USG3P kernel:     800000041db09e28 0000000000000000 0000000000000000 0000000000000000
Mar  2 14:33:13 USG3P kernel:     0000000000000000 0000000000000000 0000000000000000 0000000000000000
Mar  2 14:33:13 USG3P kernel:     ...
Mar  2 14:33:13 USG3P kernel: Call Trace:
Mar  2 14:33:13 USG3P kernel: [<ffffffffc012ced8>] kmem_cache_alloc+0x30/0x150
Mar  2 14:33:13 USG3P kernel: [<ffffffffc0b9314c>] wg_pubkey_hashtable_alloc+0x1c/0xd8 [wireguard]
Mar  2 14:33:13 USG3P kernel: [<ffffffffc0b8d01c>] wg_newlink+0xac/0x3c8 [wireguard]
Mar  2 14:33:13 USG3P kernel: [<ffffffffc0346a5c>] rtnl_newlink+0x434/0x538
Mar  2 14:33:13 USG3P kernel:
Mar  2 14:33:13 USG3P kernel:
Mar  2 14:33:13 USG3P kernel: Code: 0080882d  ffb00000  9f840020 <de220000> 000420f8  0064202d  dc840000  0044382d  dcec0008
Mar  2 14:33:13 USG3P kernel: ---[ end trace 0588e2b9fdef1fd0 ]---

I never did get around to dusting off the hardware to fix that or to see if there was a compiler bug or similar. But I wonder if it's no longer necessary.

It might be worthwhile to test out a build without the patch to see if things are now fixed.

cc @paulg1981 @Dr-Escher @phillipmcmahon @NimlothPL @Lochnair @evenfowler @aswild @dlpwx @coreyhines @dc361 @dampfklon @acejacek @jmturner @benklop who all participated on the old thread.

@FossoresLP - want to try this out?

phillipmcmahon commented 4 years ago

I have a sacrificial box I am willing to test on if someone can throw me the updated binary for our the Edgerouter X or the 6P.

On Wed, 6 May 2020 at 12:57, Jason A. Donenfeld notifications@github.com wrote:

https://github.com/WireGuard/wireguard-vyatta-ubnt/blob/c5f2656854ea5b50d5a24d1ee11a9656907822f6/disable_kmalloc.patch exists because of this old issue thread: Lochnair/vyatta-wireguard#97

I never did get around to dusting off the hardware to fix that or to see if there was a compiler bug or similar. But I wonder if it's no longer necessary.

It might be worthwhile to test this out without the patch and see if things are now fixed.

cc @paulg1981 @Dr-Escher @phillipmcmahon @NimlothPL @Lochnair @evenfowler @aswild @dlpwx @coreyhines @dc361 @dampfklon @acejacek @jmturner @benklop who all participated on the old thread.

@FossoresLP - want to try this out?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

zx2c4 commented 4 years ago

Here are some .debs to try out. You might want to uninstall the current package before installing the new package, since they share the same version number:

https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627806/artifacts/file/e50-v1-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627807/artifacts/file/e50-v2-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627808/artifacts/file/e100-v1-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627809/artifacts/file/e100-v2-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627810/artifacts/file/e200-v1-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627811/artifacts/file/e200-v2-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627812/artifacts/file/e300-v1-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627813/artifacts/file/e300-v2-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627814/artifacts/file/e1000-v1-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627815/artifacts/file/e1000-v2-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627817/artifacts/file/ugw3-v1-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627818/artifacts/file/ugw4-v1-v1.0.20200429-v1.0.20200319.deb https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/-/jobs/540627819/artifacts/file/ugwxg-v1-v1.0.20200429-v1.0.20200319.deb

phillipmcmahon commented 4 years ago

Great, that is good information to know.

Will test this afternoon/evening on both boxes and get back to this thread.

On Wed, 6 May 2020 at 13:12, Jason A. Donenfeld notifications@github.com wrote:

To grab a .deb to try out, follow these instructions:

At some point https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/pipelines/143280230 will be complete. At that point, click on one of the packages:

Then press "Download" under "Job Artifacts":

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

phillipmcmahon commented 4 years ago

Apologies forgot to add, I will test on both v1 and v2 firmware builds.

On Wed, 6 May 2020 at 13:15, Phillip McMahon phillip.mcmahon@gmail.com wrote:

Great, that is good information to know.

Will test this afternoon/evening on both boxes and get back to this thread.

On Wed, 6 May 2020 at 13:12, Jason A. Donenfeld notifications@github.com wrote:

To grab a .deb to try out, follow these instructions:

At some point https://gitlab.com/FossoresLP/wireguard-vyatta-ubnt/pipelines/143280230 will be complete. At that point, click on one of the packages:

Then press "Download" under "Job Artifacts":

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

FossoresLP commented 4 years ago

I'm going to try this on my ER-8-Pro though I don't have time to test v1 firmware right now.

zx2c4 commented 4 years ago

Disassembling a few of these at random, things look better than last time:

Hopefully kmalloc_order won't jump to kmem_cache_alloc with the null argument, a result of reading the null out of kmalloc_caches+0x78, like it did before.

Last year's issue kernel on a v2 e300:

image

Current issue kernel on a v2 e300:

image

This is looking promising.

Interestingly, I don't see anything different in the kernel source related to this between then and now. However the assembly is clearly different. I wonder if this is a bug with an older compiler that @Lochnair was using on his build infra?

Waiting to hear back from you all (@phillipmcmahon @FossoresLP and others who are interested).

phillipmcmahon commented 4 years ago

Just finished up at work. Will crack out the 6p and get testing done.

On Wed, 6 May 2020 at 22:26, Jason A. Donenfeld notifications@github.com wrote:

Interestingly, I don't see anything different in the kernel source related to this between then and now. However the assembly is clearly different. I wonder if this is a bug with an older compiler that @Lochnair https://github.com/Lochnair was using on his build infra?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WireGuard/wireguard-vyatta-ubnt/issues/10#issuecomment-624871916, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAONZR45BM4PG6P5CVGJ6GLRQHBYXANCNFSM4M2K3LHA .

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

Lochnair commented 4 years ago

@zx2c4 It shouldn't be. AFAICS the new CI uses the same script as I used to build the toolchain, so there shouldn't be any differences there.

zx2c4 commented 4 years ago

Huh, well that's very very very weird then. It looks like in slab.h, if (size > KMALLOC_MAX_CACHE_SIZE) was incorrectly evaluating to false at compile time on the old builds. On the new builds it evaluates correctly to true at compile time.

zx2c4 commented 4 years ago

Actually, your old CI shows Copied 1 artifact from "Ubiquiti » E300 kernels » v2.0.0/master" build number 1. However, I'm only able to find v2.0.1 on Ubnt's main site and in your kernel archives. I wonder if this was some beta source?

Edit: found it by messing with the URLs. Downloading and investigating now.

zx2c4 commented 4 years ago

Between v2.0.0 and v2.0.1 there don't seem to be any relevant changes either. So, my best guess at the moment remains: compiler bug.

FossoresLP commented 4 years ago

@Lochnair You are correct, the CI is currently still using mostly the same build scripts you created. @zx2c4 On the v2 firmware on my ER-8-Pro I am not experiencing any kernel panics so far. I might have to work out some issues with my testing environment (Windows is just not great for this stuff) before I can run a proper stress test.

acejacek commented 4 years ago

I'm running de-patched e100-v1 on my LITE-3 and all seems to work smoothly. No kernel panic so far.

zx2c4 commented 4 years ago

I might have to work out some issues with my testing environment (Windows is just not great for this stuff) before I can run a proper stress test.

No "stress test" needed. If you can ip link add wg0 type wireguard without crashing, it means we're all set.

zx2c4 commented 4 years ago

I'm running de-patched e100-v1 on my LITE-3 and all seems to work smoothly. Ko kernel panic so far.

Excellent, thanks for reporting.

FossoresLP commented 4 years ago

@Lochnair Any chance you still have an old Octeon toolkit lying around? Maybe Marvell fixed something there? Edit: Nvm, the last time anything in there changed was in mid 2018.

phillipmcmahon commented 4 years ago

Tested on an Edgerouter 6P (e300) device on both v1 and v2 firmware.

Working fine, no kernel panic and multiple wg interfaces up.

Yet to test on the Edgerouter X, forget this has no console port and a kernel panic is a different story to recover from. Will tackle that tomorrow.

On Wed, 6 May 2020 at 22:29, Phillip McMahon phillip.mcmahon@gmail.com wrote:

Just finished up at work. Will crack out the 6p and get testing done.

On Wed, 6 May 2020 at 22:26, Jason A. Donenfeld notifications@github.com wrote:

Interestingly, I don't see anything different in the kernel source related to this between then and now. However the assembly is clearly different. I wonder if this is a bug with an older compiler that @Lochnair https://github.com/Lochnair was using on his build infra?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WireGuard/wireguard-vyatta-ubnt/issues/10#issuecomment-624871916, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAONZR45BM4PG6P5CVGJ6GLRQHBYXANCNFSM4M2K3LHA .

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

FossoresLP commented 4 years ago

@phillipmcmahon Thanks for testing. ER-X will still be interesting as it uses a different build infrastructure. Hoping it works well, too.

aswild commented 4 years ago

@FossoresLP If you need it, I have a copy of the pre-built Octeon toolchain hosted at https://vyatta-wireguard-build.s3.amazonaws.com/OCTEON-SDK-5.1-tools.tar.xz

After the Cavium -> Marvell switch, they stopped hosting the prebuilt toolchains and now only release the source tarballs.

This toolchain was able to reproduce the kvmalloc bug back during debugging the old repo's issue. I haven't re-tested new wireguard versions without the kmalloc patch yet.

phillipmcmahon commented 4 years ago

Test on the Edgerouter X (e50) on both v1 and v2 firmware.

No issues, no kernel panic on multiple reboots and an active wg interface.

On Wed, 6 May 2020 at 23:44, Pascal Vorwerk notifications@github.com wrote:

@phillipmcmahon https://github.com/phillipmcmahon Thanks for testing. ER-X will still be interesting as it uses a different build infrastructure. Hoping it works well, too.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WireGuard/wireguard-vyatta-ubnt/issues/10#issuecomment-624907329, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAONZRY2H3R5YFPM27SLCCLRQHK3PANCNFSM4M2K3LHA .

-- Use this contact page to send me encrypted messages and files

https://flowcrypt.com/me/phillipmcmahon

P.S. Drowning in email? Try SaneBox and take back control: http://sanebox.com/t/old3m. I love it.

zx2c4 commented 4 years ago

That's excellent news. Thank you for reporting. I think we can close this issue.

I just released a new snapshot -- https://lists.zx2c4.com/pipermail/wireguard/2020-May/005408.html -- so when @FossoresLP bumps the binaries next, they won't have the patch.

dc361 commented 4 years ago

Great -- late to the game but works well on one of my ER-Xs as well.. thanks all.