flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
759 stars 32 forks source link

Cilium with loadBalancer.acceleration native fails on 3139.2.0 #721

Closed ysksuzuki closed 1 year ago

ysksuzuki commented 2 years ago

Description

All worker nodes got stuck in rebooting due to kernel panics when we run cilium v1.11.3 with loadBalancer.acceleration native setting on flatcar stable 3139.2.0.

Impact

We can't use Cilium LoadBalancer & NodePort XDP Acceleration feature. Enabling this feature is greatly beneficial for us since the XDP-based load balancer performs significantly well.

Environment and steps to reproduce

  1. Set-up:
  1. Task:
  1. Action(s):
  1. Error:
[  419.309274] SELinux:  Context system_u:object_r:container_file_t:s0 is not valid (left unmapped).
[  419.309274] SELinux:  Context system_u:object_r:container_file_t:s0 is not valid (left unmapped).
[  516.835776] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_net: link becomes ready
[  516.846413] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_host: link becomes ready
[  516.835776] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_net: link becomes ready
[  516.846413] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_host: link becomes ready
[  520.006295] NET: Registered PF_ALG protocol family
[  520.006295] NET: Registered PF_ALG protocol family
[  520.293618] general protection fault, probably for non-canonical address 0xffff12784e218580: 0000 [#1] SMP PTI
[  520.307566] CPU: 10 PID: 0 Comm: swapper/10 Not tainted 5.15.32-flatcar #1
[  520.317475] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  520.332753] RIP: 0010:uncharge_page+0x11a/0x170
[  520.338345] Code: 8d e5 ff f6 45 54 01 75 bc 48 8b 45 10 a8 03 75 27 65 48 ff 08 5b 5d 41 5c 41 5d 41 5e e9 7e 8d e5 ff 48 8b 45 10 a8 03 75 49 <65> 48 ff 00 e8 6d 8d e5 ff e9 51 ff ff ff 48 8b 45 18 f0 488
[  520.363046] RSP: 0018:ffff9fcc00294c98 EFLAGS: 00010246
[  520.369842] RAX: ffff893a22598580 RBX: ffff9fcc00294cc8 RCX: 0000000000000000
[  520.378313] RDX: 0000000000000000 RSI: ffff9fcc00294cc8 RDI: ffffcd5b04db68c0
[  520.386854] RBP: ffff893a3505e000 R08: 000000000000000c R09: 0000000000001000
[  520.394920] R10: ffff9fcc00294df0 R11: ffff893ac85fc940 R12: ffffcd5b04db68c0
[  520.403387] R13: ffff893a36da3108 R14: 0000000000000000 R15: 0000000000001000
[  520.412159] FS:  0000000000000000(0000) GS:ffff893e2bc80000(0000) knlGS:0000000000000000
[  520.423741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  520.431023] CR2: 0000561b82ea2f00 CR3: 0000000136dea003 CR4: 0000000000370ee0
[  520.439861] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  520.448639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  520.457505] Call Trace:
[  520.460841]  <IRQ>
[  520.463425]  __mem_cgroup_uncharge+0x66/0x90
[  520.468909]  __put_page+0x2d/0x40
[  520.473681]  0xffffffffc05b1807
[  520.478029]  0xffffffffc05b40a3
[  520.481641]  ? detach_buf_split+0x6c/0x130
[  520.486641]  0xffffffffc05b47a4
[  520.490499]  __napi_poll+0x2a/0x130
[  520.494837]  net_rx_action+0x24b/0x2a0
[  520.499612]  __do_softirq+0xe1/0x296
[  520.504069]  irq_exit_rcu+0x84/0xa0
[  520.508218]  common_interrupt+0x80/0xa0
[  520.513053]  </IRQ>
[  520.515977]  <TASK>
[  520.518777]  asm_common_interrupt+0x1e/0x40
[  520.524051] RIP: 0010:native_safe_halt+0xb/0x10
[  520.529823] Code: 80 48 02 20 48 8b 00 a8 08 74 83 eb c2 cc cc eb 07 0f 00 2d d9 6e 5d 00 f4 c3 0f 1f 44 00 00 eb 07 0f 00 2d c9 6e 5d 00 fb f4 <c3> cc cc cc cc 0f 1f 44 00 00 53 65 8b 15 2b 06 7e 78 66 90c
[  520.552596] RSP: 0018:ffff9fcc000cbec8 EFLAGS: 00000202
[  520.559248] RAX: ffffffff87834eb0 RBX: 000000000000000a RCX: ffff893e2bc9be40
[  520.567815] RDX: ffff893e2bcacfe0 RSI: 7fffff8708ca87a1 RDI: 00000000000c38a8
[  520.576742] RBP: ffff893a003dbf00 R08: 00000000000c38a7 R09: 00000078f7633f1e
[  520.585745] R10: 0000000000000001 R11: 0000000000000001 R12: ffff893a003dbf00
[  520.593716] R13: ffff893a003dbf00 R14: 0000000000000000 R15: 0000000000000000
[  520.601770]  ? __cpuidle_text_start+0x8/0x8
[  520.606708]  default_idle+0xa/0x10
[  520.611437]  default_idle_call+0x31/0xc0
[  520.616562]  do_idle+0x1ef/0x240
[  520.620859]  cpu_startup_entry+0x19/0x20
[  520.626026]  start_secondary+0x119/0x150
[  520.630714]  secondary_startup_64_no_verify+0xc2/0xcb
[  520.636711]  </TASK>
[  520.639427] Modules linked in: algif_hash af_alg veth xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_filter ip6table_raw ip6table_mangle ip6_tables iptable_raw iptable_mangle xt_MASQUERADE xt_conntrack xc
[  520.744070] BUG: unable to handle page fault for address: ffff893a1c054000
:     52502.029.367184] 4g3e7ne7ra]l  pr-ot-ec-ti[o en nfadul t,t prroabacble 6dc7y for non-canonical add78rdesbsb 05xfcf3fdf1276814ee 2]1-85-8-0
  00[0 0  [5#210] SM.P7 5PT3093] #PF: supervisor write access in kernel mode
[  520.761981] RIP: 0010:uncharge_page+0x11a/0x170
[  520.768107] #PF: error_code(0x0003) - permissions violation
[  520.773417] Code: 8d e5 ff f6 45 54 01 75 bc 48 8b 45 10 a8 03 75 27 65 48 ff 08 5b 5d 41 5c 41 5d 41 5e e9 7e 8d e5 ff 48 8b 45 10 a8 03 75 49 <65> 48 ff 00 e8 6d 8d e5 ff e9 51 ff ff ff 48 8b 45 18 f0 488
[  520.779352] PGD 1c8e01067 P4D 1c8e01067 PUD 100b40063 PMD 1397c3063 PTE 800000011c054161
[  520.799732] RSP: 0018:ffff9fcc00294c98 EFLAGS: 00010246
[  520.808476] Oops: 0003 [#2] SMP PTI
[  520.808627]
[  520.814189] CPU: 0 PID: 7141 Comm: dockerd Tainted: G      D           5.15.32-flatcar #1
[  520.818272] RAX: ffff893a22598580 RBX: ffff9fcc00294cc8 RCX: 0000000000000000
[  520.819827] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  520.829165] RDX: 0000000000000000 RSI: ffff9fcc00294cc8 RDI: ffffcd5b04db68c0
[  520.836706] RIP: 0010:clear_page_erms+0x7/0x10
[  520.850425] RBP: ffff893a3505e000 R08: 000000000000000c R09: 0000000000001000
[  520.859183] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 0f 1f 44 00 00 48 85 ff 0f 84 f20
[  520.865186] R10: ffff9fcc00294df0 R11: ffff893ac85fc940 R12: ffffcd5b04db68c0
[  520.874086] RSP: 0018:ffff9fcc0175faa8 EFLAGS: 00010246
[  520.874093] RAX: 0000000000000000 RBX: ffffcd5b04701500 RCX: 0000000000001000
[  520.874095] RDX: ffffcd5b04701500 RSI: ffff893a3cee9f80 RDI: ffff893a1c054000
[  520.874097] RBP: 0000000000500dc0 R08: ffffcd5b04701540 R09: 0000000000000000
[  520.896457] R13: ffff893a36da3108 R14: 0000000000000000 R15: 0000000000001000
[  520.903855] R10: 0000000000000293 R11: ffff893e2ba30c70 R12: 0000000000000000
[  520.910209] FS:  0000000000000000(0000) GS:ffff893e2bc80000(0000) knlGS:0000000000000000
[  520.918749] R13: 0000000000000901 R14: ffff893e2ba29688 R15: 0000000000000000
[  520.927353] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  520.935856] FS:  00007fda9affd640(0000) GS:ffff893e2ba00000(0000) knlGS:0000000000000000
[  520.944061] CR2: 0000561b82ea2f00 CR3: 0000000136dea003 CR4: 0000000000370ee0
[  520.952508] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  520.962629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  520.971774] CR2: ffff893a1c054000 CR3: 0000000101032004 CR4: 0000000000370ef0
[  520.978576] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  520.989157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  520.996976] Kernel panic - not syncing: Fatal exception in interrupt
[  521.004219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  521.054549] Call Trace:
[  521.057530]  <TASK>
[  521.060298]  kernel_init_free_pages.part.0+0x46/0x60
[  521.066664]  prep_new_page+0x6c/0x80
[  521.070909]  get_page_from_freelist+0xa6f/0xc70
[  521.076228]  __alloc_pages+0x181/0x310
[  521.081133]  __get_free_pages+0xd/0x30
[  521.085438]  __pud_alloc+0x2c/0x110
[  521.089573]  __handle_mm_fault+0x7de/0x1370
[  521.094367]  handle_mm_fault+0xcf/0x2a0
[  521.099230]  __get_user_pages+0x210/0x680
[  521.104299]  __get_user_pages_remote+0xda/0x330
[  521.109385]  get_arg_page+0x5f/0x100
[  521.113456]  copy_string_kernel+0x100/0x1e0
[  521.118206]  do_execveat_common.isra.0+0x109/0x1d0
[  521.123657]  __x64_sys_execve+0x33/0x40
[  521.127999]  do_syscall_64+0x3b/0x90
[  521.132146]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  521.137640] RIP: 0033:0x55ea4a3e74d6
[  521.141598] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 440
[  521.163987] RSP: 002b:000000c0017e6868 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
[  521.173370] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 000055ea4a3e74d6
[  521.181649] RDX: 000000c0018cc000 RSI: 000000c000952228 RDI: 000000c000160200
[  521.189774] RBP: 000000c0017e6a10 R08: 0000000000000052 R09: 0000000000000000
[  521.198755] R10: 0000000000000008 R11: 0000000000000206 R12: 000055ea4a3db931
[  521.207146] R13: 0000000000000001 R14: 000000c0012704e0 R15: ffffffffffffffff
[  521.214837]  </TASK>
[  521.217448] Modules linked in: algif_hash af_alg veth xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_filter ip6table_raw ip6table_mangle ip6_tables iptable_raw iptable_mangle xt_MASQUERADE xt_conntrack xc
[  521.315858] CR2: ffff893a1c054000
[  521.320959] ---[ end trace 6dc778dbb5c3d61f ]---
[  521.327623] RIP: 0010:uncharge_page+0x11a/0x170
[  521.333146] Code: 8d e5 ff f6 45 54 01 75 bc 48 8b 45 10 a8 03 75 27 65 48 ff 08 5b 5d 41 5c 41 5d 41 5e e9 7e 8d e5 ff 48 8b 45 10 a8 03 75 49 <65> 48 ff 00 e8 6d 8d e5 ff e9 51 ff ff ff 48 8b 45 18 f0 488
[  521.356648] RSP: 0018:ffff9fcc00294c98 EFLAGS: 00010246
[  521.362798] RAX: ffff893a22598580 RBX: ffff9fcc00294cc8 RCX: 0000000000000000
[  521.371761] RDX: 0000000000000000 RSI: ffff9fcc00294cc8 RDI: ffffcd5b04db68c0
[  521.380494] RBP: ffff893a3505e000 R08: 000000000000000c R09: 0000000000001000
[  521.389813] R10: ffff9fcc00294df0 R11: ffff893ac85fc940 R12: ffffcd5b04db68c0
[  521.398655] R13: ffff893a36da3108 R14: 0000000000000000 R15: 0000000000001000
[  521.407056] FS:  00007fda9affd640(0000) GS:ffff893e2ba00000(0000) knlGS:0000000000000000
[  521.417495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  521.425198] CR2: ffff893a1c054000 CR3: 0000000101032004 CR4: 0000000000370ef0
[  521.434272] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  521.444319] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  522.132705] Shutting down cpus with NMI
[  522.137871] Kernel Offset: 0x6000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  522.149691] Rebooting in 10 seconds..

Expected behavior

Cilium with loadBalancer.acceleration set to native start up fine.

NikAleksandrov commented 2 years ago

The fix I proposed[1] has been applied to -net tree[2] and should also get backported to stable kernels (stable was CCed on it).

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20220425103703.3067292-1-razor@blackwall.org/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=acb16b395c3f3d7502443e0c799c2b42df645642

jepio commented 2 years ago

Thanks @NikAleksandrov, awesome work!

jepio commented 1 year ago

This fix landed in linux 5.15.38 which is in flatcar stable 3139.2.2.

Thank you again @NikAleksandrov for fixing this issue.