falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
231 stars 164 forks source link

kernel panic error caused by a bug in the “val_to_ring” function,causing a crash of the host machine #1359

Open Spartan-65 opened 1 year ago

Spartan-65 commented 1 year ago

Describe the bug

[1047486.856617] falco: deallocating consumer ffff9aba94a2a0e0
[1047486.938918] BUG: unable to handle kernel paging request at ffffac4fe972383e
[1047486.943701] falco: no more consumers, stopping capture
[1047486.943583] IP: [<ffffffffc0b6dd70>] val_to_ring+0x80/0x460 [falco]
[1047486.950758] PGD 179982067 PUD 179983067 PMD 4f0fee067 PTE 0
[1047486.955568] Oops: 0002 [#1] SMP
[1047486.973052] Modules linked in: udp_diag binfmt_misc falco(OE) veth ipt_rpfilter vxlan ip6_udp_tunnel udp_tunnel xt_set xt_multiport ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel ip6t_MASQUERADE nf_nat_masquerade_ipv6 xt_statistic xt_nat ipt_REJECT nf_reject_ipv4 ip6table_filter ip6table_mangle ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_tables iptable_raw xt_CT dummy rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ip6table_nat ip6_tables iptable_mangle xt_comment xt_mark tcp_diag inet_diag xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat br_netfilter bridge stp llc overlay(T) openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack
[1047487.021023]  ppdev cirrus ttm iosf_mbi drm_kms_helper crc32_pclmul syscopyarea sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops drm aesni_intel joydev lrw gf128mul drm_panel_orientation_quirks glue_helper ablk_helper virtio_balloon i2c_piix4 parport_pc cryptd parport pcspkr drbd_transport_tcp(OE) drbd(OE) ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk scsi_transport_iscsi ata_piix libata crct10dif_pclmul crct10dif_common virtio_pci virtio_ring crc32c_intel serio_raw virtio floppy sunrpc dm_mirror dm_region_hash dm_log dm_mod
[1047487.052673] CPU: 4 PID: 55365 Comm: find Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1062.9.1.el7.x86_64 #1
[1047487.073362] Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
[1047487.077771] task: ffff9ab58f82d230 ti: ffff9ab949200000 task.ti: ffff9ab949200000
[1047487.082308] RIP: 0010:[<ffffffffc0b6dd70>]  [<ffffffffc0b6dd70>] val_to_ring+0x80/0x460 [falco]
[1047487.087522] RSP: 0018:ffff9ab949203b80  EFLAGS: 00010287
[1047487.091793] RAX: 000000000000001e RBX: ffff9ab949203d98 RCX: 0000000000000000
[1047487.096184] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffac4fe9723818
[1047487.100793] RBP: ffff9ab949203bc0 R08: 0000000000000000 R09: 0000000000000098
[1047487.105080] R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000fde8
[1047487.109184] R13: 0000000000000000 R14: 0000000000000001 R15: ffffac4fe972383e
[1047487.113464] FS:  0000000000000000(0000) GS:ffff9abb3fd00000(0000) knlGS:0000000000000000
[1047487.118294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1047487.122034] CR2: ffffac4fe972383e CR3: 000000054ee06000 CR4: 00000000001606e0
[1047487.127158] Call Trace:
[1047487.129721]  [<ffffffffa2104262>] ? security_inode_permission+0x22/0x30
[1047487.134438]  [<ffffffffa20565b2>] ? __inode_permission+0x52/0xd0
[1047487.138980]  [<ffffffffc0b7058d>] f_proc_startupdate+0x77d/0x1250 [falco]
[1047487.144099]  [<ffffffffa2588a26>] ? trace_do_page_fault+0x56/0x150
[1047487.149122]  [<ffffffffc0b6b576>] record_event_consumer+0x4b6/0xdf0 [falco]
[1047487.154468]  [<ffffffffa204737a>] ? __check_object_size+0x1ca/0x250
[1047487.173196]  [<ffffffffa25789b1>] ? create_elf_tables+0x542/0x56d
[1047487.177178]  [<ffffffffc0b6bf24>] record_event_all_consumers+0x74/0xb0 [falco]
[1047487.181531]  [<ffffffffc0b6c27d>] syscall_exit_probe+0xed/0x120 [falco]
[1047487.185909]  [<ffffffffa1e3c22d>] syscall_trace_leave+0xfd/0x110
[1047487.189593]  [<ffffffffa258e220>] int_check_syscall_exit_work+0x13/0x1c
[1047487.193456] Code: 46 e2 8b 53 34 48 c1 e0 06 4c 29 c8 49 89 f6 48 69 d2 30 07 00 00 48 8d 94 10 40 1a b8 c0 8b 42 50 83 f8 1b 74 25 31 d2 83 f8 2e <66> 41 89 17 0f 87 8c 02 00 00 48 8b 04 c5 10 16 b8 c0 e9 89 4d
[1047487.207611] RIP  [<ffffffffc0b6dd70>] val_to_ring+0x80/0x460 [falco]
[1047487.213099]  RSP <ffff9ab949203b80>
[1047487.216526] CR2: ffffac4fe972383e

How to reproduce it Repeatedly reload the Falco process(send SIGHUP signal)

There is no stable reproduction method, but based on the dmesg information, the anomaly occurred right at the attempting a second restart for capture.

[744044.448687] falco: initializing ring buffer for CPU 0
[744044.650350] falco: CPU buffer initialized, size=134217728
[744044.664054] falco: initializing ring buffer for CPU 1
[744045.008622] falco: CPU buffer initialized, size=134217728
[744045.021499] falco: initializing ring buffer for CPU 2
[744045.208599] falco: CPU buffer initialized, size=134217728
[744045.225987] falco: initializing ring buffer for CPU 3
[744045.408549] falco: CPU buffer initialized, size=134217728
[744045.421837] falco: initializing ring buffer for CPU 4
[744045.646908] falco: CPU buffer initialized, size=134217728
[744045.659648] falco: initializing ring buffer for CPU 5
[744045.798287] falco: CPU buffer initialized, size=134217728
[744045.810377] falco: initializing ring buffer for CPU 6
[744046.151417] falco: CPU buffer initialized, size=134217728
[744046.162903] falco: initializing ring buffer for CPU 7
[744046.304424] falco: CPU buffer initialized, size=134217728
[744046.316198] falco: starting capture
[744398.712038] falco: deallocating consumer ffff9ab9fe618000
[744398.788622] falco: no more consumers, stopping capture
[744399.940525] falco: adding new consumer ffff9ab9fe618000
[744399.999193] falco: initializing ring buffer for CPU 0
[744400.199128] falco: CPU buffer initialized, size=134217728
[744400.211459] falco: initializing ring buffer for CPU 1
[744400.599133] falco: CPU buffer initialized, size=134217728
[744400.619842] falco: initializing ring buffer for CPU 2
[744400.899144] falco: CPU buffer initialized, size=134217728
[744400.913185] falco: initializing ring buffer for CPU 3
[744401.299113] falco: CPU buffer initialized, size=134217728
[744401.315185] falco: initializing ring buffer for CPU 4
[744401.599137] falco: CPU buffer initialized, size=134217728
[744401.611852] falco: initializing ring buffer for CPU 5
[744401.899065] falco: CPU buffer initialized, size=134217728
[744401.912112] falco: initializing ring buffer for CPU 6
[744402.159074] falco: CPU buffer initialized, size=134217728
[744402.176069] falco: initializing ring buffer for CPU 7
[744402.599085] falco: CPU buffer initialized, size=134217728
[744402.614843] falco: starting capture
[744606.011475] falco: deallocating consumer ffff9ab9fe618000
[744606.128370] falco: no more consumers, stopping capture
[744607.334996] falco: adding new consumer ffff9ab9fe618000
[744607.393689] falco: initializing ring buffer for CPU 0
[744607.593637] falco: CPU buffer initialized, size=134217728
[744607.613109] falco: initializing ring buffer for CPU 1
[744607.893716] falco: CPU buffer initialized, size=134217728
[744607.907875] falco: initializing ring buffer for CPU 2
[744608.293588] falco: CPU buffer initialized, size=134217728
[744608.307129] falco: initializing ring buffer for CPU 3
[744608.593622] falco: CPU buffer initialized, size=134217728
[744608.606678] falco: initializing ring buffer for CPU 4
[744608.793564] falco: CPU buffer initialized, size=134217728
[744608.810289] falco: initializing ring buffer for CPU 5
[744609.093557] falco: CPU buffer initialized, size=134217728
[744609.112077] falco: initializing ring buffer for CPU 6
[744609.397101] falco: CPU buffer initialized, size=134217728
[744609.414630] falco: initializing ring buffer for CPU 7
[744610.093546] falco: CPU buffer initialized, size=134217728
[744610.105775] falco: starting capture
[744613.235171] falco[977062]: segfault at 14 ip 0000000000dec8b0 sp 00007ffe8256f8d8 error 4 in falco[400000+f7e000]
.
.
.
[748207.518699] falco: initializing ring buffer for CPU 7
[748207.797835] falco: CPU buffer initialized, size=134217728
[748207.813321] falco: starting capture
[748211.141318] falco[1520458]: segfault at 14 ip 0000000000dec8b0 sp 00007ffcef650028 error 4 in falco[400000+f7e000]
.
.
.
[755863.333935] falco: CPU buffer initialized, size=134217728
[755863.348515] falco: initializing ring buffer for CPU 7
[755863.490289] falco: CPU buffer initialized, size=134217728
[755863.504481] falco: starting capture
[755866.915734] falco[1838351]: segfault at 14 ip 0000000000dec8b0 sp 00007ffcfddaff18 error 4 in falco[400000+f7e000]

[803688.923205] falco: CPU buffer initialized, size=134217728
[803688.926634] falco: initializing ring buffer for CPU 7
[803689.123176] falco: CPU buffer initialized, size=134217728
[803689.139546] falco: starting capture
[803692.449034] traps: falco[3458365] general protection ip:dec8b0 sp:7ffe6d4c4958 error:0 in falco[400000+f7e000]

Expected behaviour

Screenshots

Environment

kerenl module
Linux ecs-sit-0002 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

- Kernel:

Linux ecs-sit-0002 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

- Installation method:

Kubernetes


**Additional context**

<!-- Add any other context about the problem here. -->
Andreagit97 commented 1 year ago

ei @Spartan-65 I'm sorry for that! Do you mind testing the latest Falco version https://github.com/falcosecurity/falco/releases/tag/0.35.1? Just to see if the issue is still here

Spartan-65 commented 1 year ago

ei @Spartan-65 I'm sorry for that! Do you mind testing the latest Falco version https://github.com/falcosecurity/falco/releases/tag/0.35.1? Just to see if the issue is still here

Sorry, operations engineers are not allowed to redeploy Falco to this environment until we identify the root cause of the issue.

Andreagit97 commented 1 year ago

ok makes sense, don't worry!

Repeatedly reload the Falco process(send SIGHUP signal) There is no stable reproduction method, but based on the dmesg information, the anomaly occurred right at the attempting a second restart for capture.

We will try to reproduce the issue using the repro you suggested

FedeDP commented 6 months ago

We weren't able to repro this :/ moving to next milestone. Hopefully we will be able to tackle this one. /milestone 0.17.0

FedeDP commented 5 months ago

/milestone 0.18.0 We still had no luck in reproducing this.

poiana commented 2 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 2 months ago

/remove-lifecycle stale