falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
227 stars 162 forks source link

Unexpected number of processors (on musl build) #857

Closed IanRobertson-wpe closed 7 months ago

IanRobertson-wpe commented 1 year ago

On a system with processors disabled, Falco 0.33.1 fails to properly detect and enumerate the correct number of online processors, if any but the last processor is disabled. This causes Falco to exit with a fatal error.

How to reproduce it

  1. Stop Falco.
  2. On a system with multiple processors, disable any processor excepting the last. For example, on a 4 processor system, disable cpu1 or cpu2. On an 8 processor system, disable cpu1, cpu2, cpu3, cpu4, cpu5 or cpu6. Disable the processor using the command echo 0 > /sys/devices/system/cpu/cpu3/online.
  3. Start Falco. Observe that it fails to start with the message Error: processors online: 6, expected: 7.
  4. Stop Falco. Repeat with another CPU.

Expected behaviour

Libs successfully detects and enumerates all processors and Falco starts successfully.

Screenshots

N/A

Environment

Falco version: 0.33.1 Libs version: 0.9.2 Plugin API: 2.0.0 Driver: API version: 2.0.0 Schema version: 2.0.0 Default driver: 3.0.1+driver

Wed Feb 1 14:45:50 2023: Falco version: 0.33.1 (x86_64) Wed Feb 1 14:45:50 2023: Falco initialized with configuration file: /etc/falco/falco.yaml Wed Feb 1 14:45:50 2023: Loading rules from file /etc/falco/rules.d/common.yaml { "machine": "x86_64", "nodename": "pod-170255", "release": "5.4.0-1086-gcp", "sysname": "Linux", "version": "#94~18.04.1-Ubuntu SMP Fri Aug 5 18:26:39 UTC 2022" }

FedeDP commented 1 year ago

Hi @IanRobertson-wpe ! Thanks for opening this issue! Does it happen with bpf alone? Or kmod too?

Thanks for detailed repro instructions btw!

IanRobertson-wpe commented 1 year ago

I haven't tried it with kmod, just bpf. I'm using only bpf in my environment.

FedeDP commented 1 year ago

On an 8 processor system, disable cpu1, cpu2, cpu3, cpu4, cpu5 or cpu6. Disable the processor using the command echo 0 > /sys/devices/system/cpu/cpu3/online. Start Falco. Observe that it fails to start with the message Error: processors online: 6, expected: 7.

I don't get this: if all but first and last proc are offline, why is it printing: Error: processors online: 6, expected: 7.? It should expect 2 online procs, right?

FedeDP commented 1 year ago

Cannot repro here (neither the code seems to allow such a behavior :/ ):

cat /sys/devices/system/cpu/cpu*/online
0
0
0
0
0
0
1

But everything works fine.

IanRobertson-wpe commented 1 year ago

I first observed this issue when I had two processors offline -- cpu3 and cpu7. Those are the prior logs you have. I tried re-enabling cpu7, while keeping cpu3 disabled, and it failed to start. I flipped it, enabled cpu3 and disabled cpu7, it started. I then tried disabling only cpu1, cpu2, cpu3, cpu4, cpu5, cpu6 and cpu7 in order. Any disabling of cpu1 through cpu6 fails to start, but disabling cpu7 works fine.

FedeDP commented 1 year ago

This really seems like https://github.com/falcosecurity/libs/pull/721 bug; note that the fix itself will be released with Falco 0.34 though; it sees like you are using Falco 0.33.1; have you manually backported the patch?

IanRobertson-wpe commented 1 year ago

That #721 was rolled into 0.33.1. Or at least it was supposed to have been. https://falco.org/blog/falco-0-33-1/

FedeDP commented 1 year ago

Damn you are right, i forgot (it was linked to the wrong milestone). Weird then; i cannot repro using libs master; care to test? You can easily do that by building scap-open example: https://github.com/falcosecurity/libs/tree/master/userspace/libscap/examples/01-open

FedeDP commented 1 year ago

Hi! Have you tried with latest Falco 0.34? Is the issue still present? Thank you!

IanRobertson-wpe commented 1 year ago

@FedeDP This issue persists with Falco 0.34.1.

Feb 22 16:57:47 myhost systemd[1]: Started Falco: Container Native Runtime Security.
Feb 22 16:57:47 myhost [31182]: Falco version: 0.34.1 (x86_64)
Feb 22 16:57:47 myhost [31182]: Falco initialized with configuration file: /etc/falco/falco.yaml
Feb 22 16:57:47 myhost falco[31182]: Wed Feb 22 16:57:47 2023: Falco version: 0.34.1 (x86_64)
Feb 22 16:57:47 myhost falco[31182]: Wed Feb 22 16:57:47 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Feb 22 16:57:47 myhost [31182]: Loading rules from file /etc/falco/rules.d/common.yaml
Feb 22 16:57:47 myhost falco[31182]: Wed Feb 22 16:57:47 2023: Loading rules from file /etc/falco/rules.d/common.yaml
Feb 22 16:57:47 myhost [31182]: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Feb 22 16:57:47 myhost [31182]: Enabled event sources: syscall
Feb 22 16:57:47 myhost falco[31182]: Wed Feb 22 16:57:47 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Feb 22 16:57:47 myhost falco[31182]: Wed Feb 22 16:57:47 2023: Enabled event sources: syscall
Feb 22 16:57:47 myhost falco[31182]: Wed Feb 22 16:57:47 2023: Opening capture with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
Feb 22 16:57:47 myhost [31182]: Opening capture with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
Feb 22 16:57:47 myhost [31182]: An error occurred in an event source, forcing termination...
Feb 22 16:57:47 myhost falco[31182]: Wed Feb 22 16:57:47 2023: An error occurred in an event source, forcing termination...
Feb 22 16:57:47 myhost falco[31182]: Events detected: 0
Feb 22 16:57:47 myhost falco[31182]: Rule counts by severity:
Feb 22 16:57:47 myhost falco[31182]: Triggered rules by rule name:
Feb 22 16:57:47 myhost falco[31182]: Error: processors online: 5, expected: 6
Feb 22 16:57:48 myhost systemd[1]: falco.service: Main process exited, code=exited, status=1/FAILURE
Feb 22 16:57:48 myhost systemd[1]: falco.service: Failed with result 'exit-code'.

root@myhost:~# falco --version
Wed Feb 22 17:04:45 2023: Falco version: 0.34.1 (x86_64)
Wed Feb 22 17:04:45 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
{"default_driver_version":"4.0.0+driver","driver_api_version":"3.0.0","driver_schema_version":"2.0.0","engine_version":"16","falco_version":"0.34.1","libs_version":"0.10.4","plugin_api_version":"2.0.0"}

I am pulling Falco from the deb package, in case that matters.

IanRobertson-wpe commented 1 year ago

I ran an strace and I found that it is erroring shortly after reading /sys/devices/system/cpu/cpu5/online. It is not getting far enough to also read cpu6 or cpu7. Here's the relevant snippet from the strace.

read(138, "1\n", 1024)                  = 2
lseek(138, -1, SEEK_CUR)                = 1
close(138)                              = 0
perf_event_open({type=PERF_TYPE_SOFTWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_SW_BPF_OUTPUT, ...}, -1, 1, -1, 0) = 138
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7fff0633e1a0, value=0x7fff0633e1b0, flags=BPF_ANY}, 120) = 0
ioctl(138, PERF_EVENT_IOC_ENABLE, 0)    = 0
mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7fbe5814f000
mmap(0x7fbe5894f000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 138, 0) = 0x7fbe5894f000
mmap(0x7fbe5814f000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 138, 0) = 0x7fbe5814f000
open("/sys/devices/system/cpu/cpu2/online", O_RDONLY|O_LARGEFILE) = 139
read(139, "1\n", 1024)                  = 2
lseek(139, -1, SEEK_CUR)                = 1
close(139)                              = 0
perf_event_open({type=PERF_TYPE_SOFTWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_SW_BPF_OUTPUT, ...}, -1, 2, -1, 0) = 139
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7fff0633e1a0, value=0x7fff0633e1b0, flags=BPF_ANY}, 120) = 0
ioctl(139, PERF_EVENT_IOC_ENABLE, 0)    = 0
mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7fbe5714e000
mmap(0x7fbe5794e000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 139, 0) = 0x7fbe5794e000
mmap(0x7fbe5714e000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 139, 0) = 0x7fbe5714e000
open("/sys/devices/system/cpu/cpu3/online", O_RDONLY|O_LARGEFILE) = 140
read(140, "0\n", 1024)                  = 2
lseek(140, -1, SEEK_CUR)                = 1
close(140)                              = 0
open("/sys/devices/system/cpu/cpu4/online", O_RDONLY|O_LARGEFILE) = 140
read(140, "1\n", 1024)                  = 2
lseek(140, -1, SEEK_CUR)                = 1
close(140)                              = 0
perf_event_open({type=PERF_TYPE_SOFTWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_SW_BPF_OUTPUT, ...}, -1, 4, -1, 0) = 140
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7fff0633e1a0, value=0x7fff0633e1b0, flags=BPF_ANY}, 120) = 0
ioctl(140, PERF_EVENT_IOC_ENABLE, 0)    = 0
mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7fbe5614d000
mmap(0x7fbe5694d000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 140, 0) = 0x7fbe5694d000
mmap(0x7fbe5614d000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 140, 0) = 0x7fbe5614d000
open("/sys/devices/system/cpu/cpu5/online", O_RDONLY|O_LARGEFILE) = 141
read(141, "1\n", 1024)                  = 2
lseek(141, -1, SEEK_CUR)                = 1
close(141)                              = 0
perf_event_open({type=PERF_TYPE_SOFTWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_SW_BPF_OUTPUT, ...}, -1, 5, -1, 0) = 141
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7fff0633e1a0, value=0x7fff0633e1b0, flags=BPF_ANY}, 120) = 0
ioctl(141, PERF_EVENT_IOC_ENABLE, 0)    = 0
mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7fbe5514c000
mmap(0x7fbe5594c000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 141, 0) = 0x7fbe5594c000
mmap(0x7fbe5514c000, 8392704, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 141, 0) = 0x7fbe5514c000
munmap(0x7fbe5a589000, 24576)           = 0
munmap(0x7fbe5a52b000, 126976)          = 0
munmap(0x7fbe5a584000, 20480)           = 0
munmap(0x7fbe59150000, 16781312)        = 0
close(137)                              = 0
munmap(0x7fbe5814f000, 16781312)        = 0
close(138)                              = 0
munmap(0x7fbe5714e000, 16781312)        = 0
close(139)                              = 0
munmap(0x7fbe5614d000, 16781312)        = 0
close(140)                              = 0
munmap(0x7fbe5514c000, 16781312)        = 0
close(141)                              = 0
close(16)                               = 0
close(17)                               = 0
close(18)                               = 0
close(19)                               = 0
close(20)                               = 0
close(21)                               = 0
close(22)                               = 0
close(23)                               = 0
close(24)                               = 0
close(25)                               = 0
close(26)                               = 0
close(27)                               = 0
close(28)                               = 0
close(29)                               = 0
close(30)                               = 0
close(31)                               = 0
close(32)                               = 0
close(33)                               = 0
close(34)                               = 0
close(35)                               = 0
close(36)                               = 0
close(37)                               = 0
close(38)                               = 0
close(39)                               = 0
close(40)                               = 0
close(41)                               = 0
close(42)                               = 0
close(43)                               = 0
close(44)                               = 0
close(45)                               = 0
close(46)                               = 0
close(47)                               = 0
close(48)                               = 0
close(49)                               = 0
close(50)                               = 0
close(51)                               = 0
close(52)                               = 0
close(53)                               = 0
close(54)                               = 0
close(55)                               = 0
close(56)                               = 0
close(57)                               = 0
close(58)                               = 0
close(59)                               = 0
close(60)                               = 0
close(61)                               = 0
close(62)                               = 0
close(63)                               = 0
close(64)                               = 0
close(65)                               = 0
close(66)                               = 0
close(67)                               = 0
close(68)                               = 0
close(69)                               = 0
close(70)                               = 0
close(71)                               = 0
close(72)                               = 0
close(73)                               = 0
close(74)                               = 0
close(75)                               = 0
close(76)                               = 0
close(77)                               = 0
close(78)                               = 0
close(79)                               = 0
close(80)                               = 0
close(81)                               = 0
close(82)                               = 0
close(83)                               = 0
close(84)                               = 0
close(85)                               = 0
close(86)                               = 0
close(87)                               = 0
close(88)                               = 0
close(89)                               = 0
close(90)                               = 0
close(91)                               = 0
close(92)                               = 0
close(93)                               = 0
close(94)                               = 0
close(95)                               = 0
close(96)                               = 0
close(97)                               = 0
close(98)                               = 0
close(99)                               = 0
close(100)                              = 0
close(101)                              = 0
close(102)                              = 0
close(103)                              = 0
close(104)                              = 0
close(105)                              = 0
close(106)                              = 0
close(107)                              = 0
close(108)                              = 0
close(109)                              = 0
close(110)                              = 0
close(111)                              = 0
close(112)                              = 0
close(113)                              = 0
close(114)                              = 0
close(115)                              = 0
close(116)                              = 0
close(117)                              = 0
close(118)                              = 0
close(119)                              = 0
close(120)                              = 0
close(121)                              = 0
close(122)                              = 0
close(123)                              = 0
close(124)                              = 0
close(125)                              = 0
close(126)                              = 0
close(127)                              = 0
close(128)                              = 0
close(129)                              = 0
close(130)                              = 0
close(131)                              = 0
close(132)                              = 0
close(133)                              = 0
close(134)                              = 0
close(135)                              = 0
close(136)                              = 0
close(6)                                = 0
close(7)                                = 0
close(8)                                = 0
close(9)                                = 0
close(10)                               = 0
close(11)                               = 0
close(12)                               = 0
close(13)                               = 0
close(14)                               = 0
close(15)                               = 0
munmap(0x7fbe5a151000, 4035136)         = 0
munmap(0x7fbe5a575000, 61440)           = 0
close(5)                                = 0
munmap(0x7fbe5a58f000, 49152)           = 0
mmap(NULL, 352276, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbe5a544000
mmap(NULL, 352276, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbe5a4ed000
munmap(0x7fbe5a4ed000, 356352)          = 0
sendto(3, "<14>Feb 22 20:15:11 : An error o"..., 83, 0, NULL, 0) = 83
writev(2, [{iov_base="Wed Feb 22 20:15:11 2023: ", iov_len=26}, {iov_base="An error occurred in an event so"..., iov_len=61}], 2Wed Feb 22 20:15:11 2023: An error occurred in an event source, forcing termination...
) = 87
writev(2, [{iov_base="", iov_len=0}, {iov_base=NULL, iov_len=0}], 2) = 0
ioctl(1, TIOCGWINSZ, {ws_row=72, ws_col=273, ws_xpixel=1911, ws_ypixel=1008}) = 0
writev(1, [{iov_base="", iov_len=0}, {iov_base="Events detected: 0\nRule counts b"..., iov_len=74}], 2) = 74
close(4)                                = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, {sa_handler=0x4d9160, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, {sa_handler=0x4d9160, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, 8) = 0
rt_sigaction(SIGUSR1, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, {sa_handler=0x4d9150, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, 8) = 0
rt_sigaction(SIGHUP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, {sa_handler=0x4d9140, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x1121378}, 8) = 0
writev(2, [{iov_base="Error: processors online: 5, exp"..., iov_len=41}, {iov_base=NULL, iov_len=0}], 2Error: processors online: 5, expected: 6
) = 41
IanRobertson-wpe commented 1 year ago

@FedeDP I think what is happening here is this:

The loop is operating over handle->m_ncpus, which given the strace, means it must be equal to 6, given that we see the loop iterate from j=0 through j=5. This makes online_cpus equal to 5, because it ranged over 6 cpus (0 to 5) and one of those was disabled.

When it gets out of the loop, it's comparing online_cpu to handle->m_dev_set.m_ndevs, which is 6, and they don't match.

I think the problem here lies with using handle->m_ncpus in the loop. This seems to be based off of the online cpus instead of all cpus, so we're never parsing through all the cpus to determine which are actually online or not.

This also makes sense for why the issue doesn't appear when only the last cpu is disabled. In that situation, handle->m_ncpus is one less than the total cpu count, but it never processes that last cpu in the loop, but it doesn't matter in that instance because the online_cpu count remains the same and matches with the comparsion to handle->m_dev_set.m_ndevs.

FedeDP commented 1 year ago

Hi @IanRobertson-wpe ! Thanks for the super detailed responses! I will try to understand the issue and provide a fix asap :) I am so sorry you are still hit by this :disappointed:

FedeDP commented 1 year ago

I think the problem here lies with using handle->m_ncpus in the loop.

Well, but handle->m_ncpus is 6 because it is the number of proc in the system: ssize_t num_cpus = sysconf(_SC_NPROCESSORS_CONF);

While instead devset->m_ndevs is 5 because it is the number of online procs: ssize_t num_devs = sysconf(_SC_NPROCESSORS_ONLN);

What we do is:

At the end of the loop, we expect online cpus counter to be equal to number of m_ndevs (that is sysconf(_SC_NPROCESSORS_ONLN);).

I will try to reproduce the issue! It will surely help me here :)

Feb 22 16:57:47 myhost falco[31182]: Error: processors online: 5, expected: 6

Can you share once again number of online procs and number of procs, ie: _NPROCESSORS_CONF and _NPROCESSORS_ONLN)?

Given we loop from 0 to 5, i'd expect the former to be 6; but since cpu 5 is offline, i'd expect the latter to be 5; but your Falco error is saying that it is expecting 6 online cpus.

FedeDP commented 1 year ago

Also, notice your strace output:

// cpu0
// cpu0 is assumed online, so online_cpu = 1

// cpu1
open("/sys/devices/system/cpu/cpu1/online", O_RDONLY|O_LARGEFILE) = 139
read(138, "1\n", 1024)                  = 2  // cpu1 is online ->  online_cpu = 2

// cpu2
open("/sys/devices/system/cpu/cpu2/online", O_RDONLY|O_LARGEFILE) = 139
read(139, "1\n", 1024)                  = 2 // cpu2 is online -> online_cpu = 3

// cpu3
open("/sys/devices/system/cpu/cpu3/online", O_RDONLY|O_LARGEFILE) = 140
read(140, "0\n", 1024)                  = 2 // cpu3 is offline -> online_cpu = 3

// cpu4
open("/sys/devices/system/cpu/cpu4/online", O_RDONLY|O_LARGEFILE) = 140
read(140, "1\n", 1024)                  = 2 // cpu4 is online -> online_cpu = 4

// cpu5
open("/sys/devices/system/cpu/cpu5/online", O_RDONLY|O_LARGEFILE) = 141
read(141, "1\n", 1024)                  = 2 // cpu5 is online -> online_cpu = 5

...

if(online_cpu != handle->m_dev_set.m_ndevs)
{
        return scap_errprintf(handle->m_lasterr, 0, "processors online: %d, expected: %d", online_cpu, handle->m_dev_set.m_ndevs);
}

So, online_cpu (5) is different from m_dev_set.m_ndevs that is _NPROCESSORS_ONLN. But why?

IanRobertson-wpe commented 1 year ago

@FedeDP This particular example is an 8 processor system, with cpus 0, 1, 2, 4, 5 and 6 online, and 3 and 7 offline. I have other systems with cpu disabled of various configurations, all have this issue, so long as at least one processor between 0 and the last is offline.

_NPROCESSORS_CONF 8 _NPROCESSORS_ONLN 6

The looping (var j) should be done over the total number of CPUs, 8 not 6. It seems that handle->m_ncpus is returning the number of online processors, not the total processors.

FedeDP commented 1 year ago

The looping (var j) should be done over the total number of CPUs, 8 not 6. It seems that handle->m_ncpus is returning the number of online processors, not the total processors.

That's not the case actually :/

ssize_t num_cpus = sysconf(_SC_NPROCESSORS_CONF);

This is handle->m_ncpus. I am not getting how this is not 8 then!

edit: see here:

ssize_t num_cpus = sysconf(_SC_NPROCESSORS_CONF);
    if(num_cpus == -1)
    {
        return scap_errprintf(engine.m_handle->m_lasterr, errno, "_SC_NPROCESSORS_CONF");
    }

    engine.m_handle->m_ncpus = num_cpus;
FedeDP commented 1 year ago

Perhaps the cpu affinity on which Falco process is bound is just 6 of them? https://linux.die.net/man/2/sched_getaffinity

Can you share output of

taskset -pc $FALCO_PID

(using real Falco pid instead of the placeholder :) )

IanRobertson-wpe commented 1 year ago

Here you go:

root@myhost:~# FALCO_BPF_PROBE="" falco & taskset -pc $(pidof falco) [1] 27482 pid 27482's current affinity list: 0-2,4-6 root@myhost:~# Thu Feb 23 16:50:14 2023: Falco version: 0.34.1 (x86_64) Thu Feb 23 16:50:14 2023: Falco initialized with configuration file: /etc/falco/falco.yaml Thu Feb 23 16:50:14 2023: Loading rules from file /etc/falco/rules.d/common.yaml Thu Feb 23 16:50:14 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs) Thu Feb 23 16:50:14 2023: Enabled event sources: syscall Thu Feb 23 16:50:14 2023: Opening capture with BPF probe. BPF probe path: /root/.falco/falco-bpf.o Thu Feb 23 16:50:14 2023: An error occurred in an event source, forcing termination... Events detected: 0 Rule counts by severity: Triggered rules by rule name: Error: processors online: 5, expected: 6

FedeDP commented 1 year ago

current affinity list: 0-2,4-6

yay! Your Falco instance has affinity to only see 6 cpus! Don't know why though.

EDIT: example output on my system:

pid 3187's current affinity list: 0-7
FedeDP commented 1 year ago

Wait! I am not sure whether they are the online CPUs though. I will double check what's the output on my system if i disable a CPU.

IanRobertson-wpe commented 1 year ago

If I change the CPU affinity to provide Falco with 0-2 processors, it loads successfully. I believe this proves that the CPU affinity is impacting the value that it is receiving from the call to sysconf(_SC_NPROCESSORS_CONF).

root@myhost:~# FALCO_BPF_PROBE="" taskset --cpu-list 0-2 falco
pid 10036's current affinity list: 0-2
root@myhost:~# Thu Feb 23 17:36:31 2023: Falco version: 0.34.1 (x86_64)
Thu Feb 23 17:36:31 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Thu Feb 23 17:36:31 2023: Loading rules from file /etc/falco/rules.d/common.yaml
Thu Feb 23 17:36:31 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Feb 23 17:36:31 2023: Enabled event sources: syscall
Thu Feb 23 17:36:31 2023: Opening capture with BPF probe. BPF probe path: /root/.falco/falco-bpf.o

If this is indeed the case, then it would seem like we don't really need that loop at all, since we can just assume all those processors are online, no? Otherwise, it seems that we would just need to use a different place to get the actual number of processors from (e.g. /proc/cpuinfo).

IanRobertson-wpe commented 1 year ago

OK, so this is wild, check this out.

I found this article about a potential bug with musl where the codepath for _SC_NPROCESSORS_CONF actually pulls data for ONLN instead. http://www.landley.net/notes-2022.html#26-07-2022

I created this sample code to test:

#include <stdlib.h>
#include <stdio.h>
#include <sys/sysinfo.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    printf("get_nprocs_conf(): %d\nget_nprocs(): %d\nsysconf(_SC_NPROCESSORS_CONF): %ld\nsysconf(_SC_NPROCESSORS_ONLN): %ld\n",
    get_nprocs_conf(), get_nprocs(), sysconf(_SC_NPROCESSORS_CONF), sysconf(_SC_NPROCESSORS_ONLN));
    exit(EXIT_SUCCESS);
}

When I use gcc to compile and run, I get this:

get_nprocs_conf(): 8
get_nprocs(): 6
sysconf(_SC_NPROCESSORS_CONF): 8
sysconf(_SC_NPROCESSORS_ONLN): 6

When I use musl-gcc to compile and run, I get this:

get_nprocs_conf(): 6
get_nprocs(): 6
sysconf(_SC_NPROCESSORS_CONF): 6
sysconf(_SC_NPROCESSORS_ONLN): 6

You can see the issue here, where the case for JT_NPROCESSORS_CONF simply flows into the case for JT_NPROCESSORS_ONLN: https://git.musl-libc.org/cgit/musl/tree/src/conf/sysconf.c#n202

Are you using musl? If so, then I think the solution here would be to gather this information by looking at /sys/devices/system/cpu/cpuX.

FedeDP commented 1 year ago

This is getting more and more interesting! Can you give a try at using this: https://stackoverflow.com/a/62867839 ?

FedeDP commented 1 year ago

We are using musl btw but only for the static Falco build (so, not the one shipped in Deb/rpm packages but the tar.gz of Falco)

IanRobertson-wpe commented 1 year ago

I double-checked and confirmed that I am using the tar.gz, not the deb, so that makes sense.

I looked at the SO link, tried that code with a musl-gcc and it returns a 6, not an 8, so that method doesn't work.

FedeDP commented 1 year ago

So, are you using the static tar.gz? That is the one built with musl!

IanRobertson-wpe commented 1 year ago

Yes, that is the one that I am using.

FedeDP commented 1 year ago

Yes, it uses musl then. Are you able to try using a deb/rpm?

IanRobertson-wpe commented 1 year ago

I've re-factored my custom deb to use the glibc version out of the Falco deb instead. I've tested and this does function correctly with this glibc version.

I reported this issue to the musl-libc team on their IRC channel. They've known about this since 2019, but haven't taken action, and based on my conversation I don't see this happening anytime soon. Part of the issue is that they lack a bug tracking system, which I personally find as a huge red flag.

I've got a path forward now. I'll leave it to you to determine what, if anything, you want to do here. My suggestion is that, at a minimum, a known issue is documented for this so anyone else who runs into it in the future don't waste their time. It's an edge case to be sure, but one we know about now.

Thanks for your help.

FedeDP commented 1 year ago

I've got a path forward now. I'll leave it to you to determine what, if anything, you want to do here. My suggestion is that, at a minimum, a known issue is documented for this so anyone else who runs into it in the future don't waste their time. It's an edge case to be sure, but one we know about now.

I fully agree. /cc @leogr any idea?

leogr commented 1 year ago

I agree that as first we have document this. Even if it's an edge case, I would write a note in the official documentation.

Also, I want to thank you'll for the awesome job in debugging this issue. It's really impressive.

FedeDP commented 1 year ago

Agree with Leo! Thank you very much Ian, you digged into this relentlessly and provided lots of information to everyone else! Great job!

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

leogr commented 1 year ago

/remove-lifecycle stale

cc @vjjmiras

Andreagit97 commented 1 year ago

@FedeDP do we need to do something else here?

FedeDP commented 1 year ago

I think we should add proper documentation about this somewhere, as leo suggested.

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 1 year ago

/remove-lifecycle stale

poiana commented 10 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

leogr commented 10 months ago

I think we should add proper documentation about this somewhere, as leo suggested.

cc @falcosecurity/falco-website-maintainers can you help with this?

leogr commented 10 months ago

/remove-lifecycle stale /kind documentation

poiana commented 7 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

leogr commented 7 months ago

Since 0.37 we have discontinued the musl build (to address licensing issues with libelf). Since we don't have anymore a static musl build, the documentation request does not apply anymore.

/close

poiana commented 7 months ago

@leogr: Closing this issue.

In response to [this](https://github.com/falcosecurity/libs/issues/857#issuecomment-1978338767): >Since 0.37 we have discontinued the musl build (to address licensing issues with libelf). Since we don't have anymore a static musl build, the documentation request does not apply anymore. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.