falcosecurity / falcosidekick

Connect Falco to your ecosystem
Apache License 2.0
546 stars 177 forks source link

Sidekick Crashes After Triggering the Same Rule Multiple Times in a Short Window with Falco 0.38.2 #1011

Open cme-incom opened 2 weeks ago

cme-incom commented 2 weeks ago

Describe the bug

After executing Aqua Security’s kube-bench, the Sidekick service fails and crashes. This issue occurs when the same Falco rule is triggered more than 15 times within a very short time window. Instead of handling the load gracefully, the service crashes.

How to reproduce it

Run Aqua Security’s kube-bench to perform security checks. Ensure that a specific Falco rule is triggered more than 15 times in a very short window.

Expected behaviour

The Sidekick service should handle multiple rule triggers without crashing. It should remain stable and not be terminated

Screenshots No screenshots available.

Environment

The rule triggered:

    # Note that runsv is both in protected_shell_spawner and the
    # exclusions by pname. This means that runsv can itself spawn shells
    # (the ./run and ./finish scripts), but the processes runsv can not
    # spawn shells.
    #
    # Also, trivy uses this for vulnerability scanning and kyverno uses it to clean ephemeral reports
    # And we exclude the incom user
    - rule: Incom Run shell untrusted
      desc: > 
        An attempt to spawn a shell below a non-shell application. The non-shell applications that are monitored are 
        defined in the protected_shell_spawner macro, with protected_shell_spawning_binaries being the list you can 
        easily customize. For Java parent processes, please note that Java often has a custom process name. Therefore, 
        rely more on proc.exe to define Java applications. This rule can be noisier, as you can see in the exhaustive 
        existing tuning. However, given it is very behavior-driven and broad, it is universally relevant to catch 
        general Remote Code Execution (RCE). Allocate time to tune this rule for your use cases and reduce noise. 
        Tuning suggestions include looking at the duration of the parent process (proc.ppid.duration) to define your 
        long-running app processes. Checking for newer fields such as proc.vpgid.name and proc.vpgid.exe instead of the 
        direct parent process being a non-shell application could make the rule more robust.
      condition: >
        spawned_process
        and shell_procs
        and proc.pname exists
        and not (k8s.ns.name = trivy)
        and not (k8s.ns.name = kyverno)
        and not serf_script
        and not check_process_status
        and not (container.image.repository in (incom_network_images))
        and not (user.name = incom)
        and not (proc.pexe = /bin/containerd-shim-runc-v2)
      output: Shell spawned by untrusted binary (parent_exe=%proc.pexe parent_exepath=%proc.pexepath pcmdline=%proc.pcmdline gparent=%proc.aname[2] ggparent=%proc.aname[3] aname[4]=%proc.aname[4] aname[5]=%proc.aname[5] aname[6]=%proc.aname[6] aname[7]=%proc.aname[7] evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)
      priority: ERROR
      tags: [maturity_stable, host, container, process, shell, mitre_execution, T1059.004]

The error msg from the failed pod:

2024/09/23 17:48:45 [INFO]  : Slack - POST OK (200)
2024/09/23 17:48:45 [INFO]  : Pagerduty - Create Incident OK
2024/09/28 09:25:13 [INFO]  : Slack - POST OK (200)
fatal error: concurrent map iteration and map write
goroutine 502012 [running]:
github.com/falcosecurity/falcosidekick/outputs.getSortedStringKeys(0xc00089e1e0?)
    /home/runner/work/falcosidekick/falcosidekick/outputs/utils.go:12 +0x6b
github.com/falcosecurity/falcosidekick/outputs.newSlackPayload({{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, 0x0}, ...}, ...)
    /home/runner/work/falcosidekick/falcosidekick/outputs/slack.go:75 +0x62c
github.com/falcosecurity/falcosidekick/outputs.(*Client).SlackPost(0xc0008e1d00, {{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, ...}, ...})
    /home/runner/work/falcosidekick/falcosidekick/outputs/slack.go:152 +0x78
created by main.forwardEvent in goroutine 502010
    /home/runner/work/falcosidekick/falcosidekick/handlers.go:235 +0x148
goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fce1861fed0, 0x72)
    $GOROOT/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0x3?, 0x1?, 0x0)
    $GOROOT/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
    $GOROOT/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0009dd100)
    $GOROOT/src/internal/poll/fd_unix.go:611 +0x2ac
net.(*netFD).accept(0xc0009dd100)
    $GOROOT/src/net/fd_unix.go:172 +0x29
net.(*TCPListener).accept(0xc0009c95e0)
    $GOROOT/src/net/tcpsock_posix.go:159 +0x1e
net.(*TCPListener).Accept(0xc0009c95e0)
    $GOROOT/src/net/tcpsock.go:327 +0x30
net/http.(*Server).Serve(0xc000568690, {0x3079fb0, 0xc0009c95e0})
    $GOROOT/src/net/http/server.go:3255 +0x33e
net/http.(*Server).ListenAndServe(0xc000568690)
    $GOROOT/src/net/http/server.go:3184 +0x71
main.main()
    /home/runner/work/falcosidekick/falcosidekick/main.go:934 +0x1287
goroutine 13 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000143680)
    pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:292 +0x9f
created by go.opencensus.io/stats/view.init.0 in goroutine 1
    pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:34 +0x8d
goroutine 502011 [runnable]:
net.(*OpError).Timeout(0xc0000cf400?)
    $GOROOT/src/net/net.go:507 +0x133
net/http.(*connReader).backgroundRead(0xc00067d290)
    $GOROOT/src/net/http/server.go:708 +0xa9
created by net/http.(*connReader).startBackgroundRead in goroutine 502010
    $GOROOT/src/net/http/server.go:677 +0xba
goroutine 502013 [runnable]:
bytes.(*Buffer).WriteByte(0xc000ce8980?, 0x7b?)
    $GOROOT/src/bytes/buffer.go:285 +0x9c
encoding/json.mapEncoder.encode({0xc000b16538?}, 0xc000ce8980, {0x2426d60?, 0xc00067d3b0?, 0x2426d60?}, {0x14?, 0x0?})
    $GOROOT/src/encoding/json/encode.go:737 +0x215
encoding/json.(*encodeState).reflectValue(0xc000ce8980, {0x2426d60?, 0xc00067d3b0?, 0x7c9779?}, {0x40?, 0xde?})
    $GOROOT/src/encoding/json/encode.go:321 +0x73
encoding/json.interfaceEncoder(0xc000ce8980, {0x23dde40?, 0xc0008c66f0?, 0x6f8345?}, {0x60?, 0xa6?})
    $GOROOT/src/encoding/json/encode.go:658 +0xba
encoding/json.structEncoder.encode({{{0xc00033e488, 0x8, 0x8}, 0xc000652a80, 0xc000652ab0}}, 0xc000ce8980, {0x273f520?, 0xc0008c6680?, 0xc0000f8f20?}, {0x0, ...})
    $GOROOT/src/encoding/json/encode.go:704 +0x21e
encoding/json.ptrEncoder.encode({0xc0000f8f20?}, 0xc000ce8980, {0x2275700?, 0xc0000f8f20?, 0xc0000f8f20?}, {0xa?, 0x0?})
    $GOROOT/src/encoding/json/encode.go:876 +0x23c
encoding/json.structEncoder.encode({{{0xc00033e008, 0x8, 0x8}, 0xc000652b40, 0xc000652ba0}}, 0xc000ce8980, {0x273f640?, 0xc0000f8ea0?, 0xc000b16950?}, {0x0, ...})
    $GOROOT/src/encoding/json/encode.go:704 +0x21e
encoding/json.(*encodeState).reflectValue(0xc000ce8980, {0x273f640?, 0xc0000f8ea0?, 0x4?}, {0x60?, 0x24?})
    $GOROOT/src/encoding/json/encode.go:321 +0x73
encoding/json.(*encodeState).marshal(0x411ce5?, {0x273f640?, 0xc0000f8ea0?}, {0xc8?, 0xa5?})
    $GOROOT/src/encoding/json/encode.go:297 +0xc5
encoding/json.Marshal({0x273f640, 0xc0000f8ea0})
    $GOROOT/src/encoding/json/encode.go:163 +0xd0
github.com/PagerDuty/go-pagerduty.ManageEventWithContext({0x3089ca0, 0x46aa1a0}, {{0xc000064015, 0x20}, {0x289802d, 0x7}, {0x0, 0x0}, {0x0, 0x0, ...}, ...})
    pkg/mod/github.com/!pager!duty/go-pagerduty@v1.8.0/event_v2.go:175 +0x74
github.com/falcosecurity/falcosidekick/outputs.(*Client).PagerdutyPost(0xc0008e1e00, {{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, ...}, ...})
    /home/runner/work/falcosidekick/falcosidekick/outputs/pagerduty.go:34 +0x1ac
created by main.forwardEvent in goroutine 502010
    /home/runner/work/falcosidekick/falcosidekick/handlers.go:375 +0x2d28
goroutine 502010 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc000ce8690, 0x0)
    $GOROOT/src/runtime/sema.go:569 +0x159
sync.(*Cond).Wait(0xc00067d290?)
    $GOROOT/src/sync/cond.go:70 +0x85
net/http.(*connReader).abortPendingRead(0xc00067d290)
    $GOROOT/src/net/http/server.go:729 +0xa6
net/http.(*response).finishRequest(0xc000578b60)
    $GOROOT/src/net/http/server.go:1671 +0x87
net/http.(*conn).serve(0xc000897560, {0x3089e60, 0xc00066de90})
    $GOROOT/src/net/http/server.go:2045 +0x62b
created by net/http.(*Server).Serve in goroutine 1
    $GOROOT/src/net/http/server.go:3285 +0x4b4
Issif commented 1 week ago

This is another issue created about this "bug", wasn't able to reproduce til now https://github.com/falcosecurity/charts/issues/746

Issif commented 1 week ago

Which version of Falcosidekick are you running? The 2.29.0 or the latest (== master) ?