falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.26k stars 895 forks source link

Falco 0.38 crashes node when package management process launched inside container #3276

Closed chenliu1993 closed 1 month ago

chenliu1993 commented 2 months ago

Describe the bug

I am using falco 0.38 with -o engine.kind=kmod to run on redhat. but whenever there is a package management process start ed inside other containers like apt-get or dnf will cause node crashes and reboot.

How to reproduce it

first run falco with

        - --cri
        - /run/containerd/containerd.sock
        - --cri
        - /run/crio/crio.sock
        - -o
        - engine.kind=kmod

and then run something like docker run --name=sysstat -it public.ecr.aws/docker/library/rockylinux:9.3.20231119-minimal microdnf install net-tools the process will get stuck at ... Installing: systemd-pam;252-32.el9_4;x86_64;baseos Installing: systemd;252-32.el9_4;x86_64;baseos .... actually node is rebooted. . A crash report would generate:

...
[  570.081930] CPU: 5 PID: 31697 Comm: systemd-machine Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-427.16.1.el9_4.x86_64 #1
[  570.094553] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 07/20/2023
[  570.103146] RIP: 0010:ppm_is_upper_layer+0x3b/0x60 [falco]
[  570.108690] Code: d2 74 38 48 8b 4f 18 48 81 7a 60 30 76 4c 79 0f 94 c0 48 8b 71 78 48 85 f6 0f 95 c2 20 d0 74 1b 31 c0 48 81 79 30 70 fd ff ff <48> 8b 16 74 0c 8b 01 c1 e8 05 09 c2 89 d0 83 e0 01 c3 cc cc cc cc
[  570.127618] RSP: 0018:ffffb17b04897be8 EFLAGS: 00010283
[  570.132900] RAX: 0000000000000000 RBX: ffffb17b04897d78 RCX: ffff97010bdc6840
[  570.140096] RDX: ffff97011f174001 RSI: 0000000000000001 RDI: ffff970137b20900
[  570.147289] RBP: 0000000000000001 R08: ffff97015e682a80 R09: 000000000000001c
[  570.154484] R10: 6f642f6563696c73 R11: 2e6d65747379732f R12: ffff970304336000
[  570.161677] R13: ffff9746381f8000 R14: ffff970137b20900 R15: 0000000000000000
[  570.168874] FS:  0000000000000000(0000) GS:ffff973f3fd40000(0000) knlGS:0000000000000000
[  570.177032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  570.182826] CR2: 0000000000000001 CR3: 000000075e5da005 CR4: 00000000007706e0
[  570.190019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  570.197213] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  570.204405] PKRU: 55555554
[  570.207133] Call Trace:
[  570.209605]  <TASK>
[  570.211722]  ? show_trace_log_lvl+0x1c4/0x2df
[  570.216122]  ? show_trace_log_lvl+0x1c4/0x2df
[  570.220517]  ? f_proc_startupdate+0xbd8/0x14e0 [falco]
[  570.225705]  ? __die_body.cold+0x8/0xd
[  570.229484]  ? page_fault_oops+0x134/0x170
[  570.233616]  ? exc_page_fault+0x62/0x150
[  570.237573]  ? asm_exc_page_fault+0x22/0x30
[  570.241794]  ? ppm_is_upper_layer+0x3b/0x60 [falco]
[  570.246721]  f_proc_startupdate+0xbd8/0x14e0 [falco]
[  570.251754]  ? follow_page_pte+0x1fd/0x430
[  570.255905]  ? __get_user_pages+0x226/0x470
[  570.260123]  record_event_consumer+0x419/0x800 [falco]
[  570.265308]  ? __access_remote_vm+0x157/0x3d0
[  570.269702]  ? __access_remote_vm+0x33a/0x3d0
[  570.274097]  syscall_exit_probe+0x165/0x200 [falco]
[  570.279018]  syscall_exit_work+0xb0/0x130
[  570.283064]  syscall_exit_to_user_mode+0x9/0x40
[  570.288183]  do_syscall_64+0x69/0x90
[  570.292304]  ? exc_page_fault+0x62/0x150
[  570.296768]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  570.302349] RIP: 0033:0x7fd8b476fe70
[  570.306470] Code: Unable to access opcode bytes at RIP 0x7fd8b476fe46.
[  570.313533] RSP: 002b:00007ffd537b2360 EFLAGS: 00000200 ORIG_RAX: 000000000000003b
[  570.321642] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  570.329312] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  570.336970] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  570.344620] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  570.352266] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  570.359901]  </TASK>
[  570.362535] Modules linked in: vrf falco(OE) msdos 8021q garp mrp ip6table_raw iptable_raw tun vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat iptable_filter ip_tables veth xt_nat xt_statistic xt_mark ipt_REJECT nf_reject_ipv4 xt_comment xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay bonding rfkill ext4 mbcache jbd2 vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate ipmi_ssif mlx5_ib ib_uverbs mgag200 drm_shmem_helper acpi_ipmi drm_kms_helper intel_uncore mei_me syscopyarea pcspkr ib_core ipmi_si sysfillrect mei sysimgblt ipmi_devintf fb_sys_fops hpilo lpc_ich ioatdma intel_pch_thermal
[  570.362619]  ipmi_msghandler acpi_tad acpi_power_meter drm xfs libcrc32c dm_multipath sd_mod sg mlx5_core nvme_tcp nvme_fabrics nvme crct10dif_pclmul crc32_pclmul igb crc32c_intel nvme_core ghash_clmulni_intel uas usb_storage nvme_common mlxfw i2c_algo_bit t10_pi dca psample pci_hyperv_intf hpwdt wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse [last unloaded: falco]

Expected behaviour

Should work same as modern_ebpf mode

Screenshots

Environment

Additional context

Andreagit97 commented 2 months ago

ei @chenliu1993 thank you for reporting! I will try to reproduce it!

chenliu1993 commented 2 months ago

Thank you so much!

Andreagit97 commented 2 months ago

I've reproduced the issue and I confirm that the one you reported is a valid repro! thank you for this! I'm working on a solution!

chenliu1993 commented 1 month ago

Hi @Andreagit97 , sry to ping, but do we have plan for putting this fix in 0.38.2 (if falco has plan) or 0.39.x?

Andreagit97 commented 1 month ago

hi @chenliu1993 ! Yes, this fix will be shipped in Falco 0.38.2 :)

chenliu1993 commented 1 month ago

hi @chenliu1993 ! Yes, this fix will be shipped in Falco 0.38.2 :)

That would be much helpful! we are using falco to do security things. Thanks for quick fix!

LucaGuerra commented 1 month ago

The new driver version is now released with Falco 0.38.2 🎉