Open petterreinholdtsen opened 8 months ago
Thanks @petterreinholdtsen:
Just to clarify
Some times the events from sandbox rule "Write below root" contain a incomplete path
is truly sometimes, not all the times?
Would you mind observing again with the following additional output fields? %fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd
(https://falco.org/docs/reference/rules/supported-fields/) -> are all of these fields missing?
In addition, is it some new issues or have you observed this before in Falco? Plus could you check if you have a decent amount of dropped open*
syscalls leveraging https://falco.org/docs/metrics/falco-metrics/, see also https://falco.org/docs/troubleshooting/dropping/.
[Melissa Kilby]
is truly sometimes, not all the times?
Yes. Here is an example with // instead of
12:26:02.996402180: Error File below / or /root opened for writing (file=//cf_lock.lmdb.lock evt_type=openat user=root user_uid=0 user_loginuid=-1 process=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O_RDWR container_id=host container_name=host)
This is on the same machine.
Would you mind observing again with the following additional output fields?
%fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd
(https://falco.org/docs/reference/rules/supported-fields/) -> are all of these fields missing?
I will see what I can do. I do not have direct access to any of the problematic machines, only the logs from them, but will try to figure out a way to test a modified setup one one of the machines with this behavior.
In addition, is it some new issues or have you observed this before in Falco? Plus could you check if you have a decent amount of dropped
open*
syscalls leveraging https://falco.org/docs/metrics/falco-metrics/, see also https://falco.org/docs/troubleshooting/dropping/.
In total for the last event bulk I see 18 "Falco internal: syscall event
drop" events across 5 machines and 8000 events. All of them happened on
one of the machines with the
-- Happy hacking Petter Reinholdtsen
@petterreinholdtsen would it be possible to instead use the new https://falco.org/docs/metrics/falco-metrics/ framework as it supports monotonic counters? That way we know more accurately how many dropped events there are since the beginning of launching Falco and more. Thanks!
Looking forward to hearing back from you in regards to the other fields %fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd
.
[Melissa Kilby]
@petterreinholdtsen would it be possible to instead use the new https://falco.org/docs/metrics/falco-metrics/ framework as it supports monotonic counters?
I am reluctant to add this to all our hosts, as we try to reduce the noise and number of irrelevant events in the log, and the metrics seem add regular quite a lot of "non-events' to the log.
Looking forward to hearing back from you in regards to the other fields
%fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd
.
This change rolled out this weekend, and here are a few example log
events, one with
Error File below / or /root opened for writing
(file=
Error File below / or /root opened for writing (file=//cf_lock.lmdb.lock nameraw=cf_lock.lmdb.lock pathname=/var/cfengine/state/cf_lock.lmdb.lock pathnameraw=cf _lock.lmdb.lock proccwd=/var/cfengine/state/ evt_type=openat user=root user_uid=0 user_loginuid=-1 pr ocess=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O _RDWR container_id=host container_name=host)
-- Happy hacking Petter Reinholdtsen
[Petter Reinholdtsen]
This change rolled out this weekend, and here are a few example log events, one with
and one with a more path like file value.
Perhaps the relevant difference is the file permission of the cwd directory?
drwx------. 1 root root 36 jan. 12 01:00 /etc/sssd/ drwxr-xr-x. 1 root root 318 jan. 18 12:37 /var/cfengine/
Even with the latter, '//' seem like the wrong path.
-- Happy hacking Petter Reinholdtsen
Re https://falco.org/docs/metrics/falco-metrics/ you can adjust the timer aka how often you want to emit a log. Just FYI.
Error File below / or /root opened for writing (file=
cf_lock.lmdb.lock nameraw=cf_lock.lmdb.lock pathname=/etc/sssd/cf_lock.lmdb.lock pathnameraw=cf_lock.lmdb.lock proccwd=/etc/sssd/ evt_type=openat user=root user_uid=0 user_loginuid=-1 process=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O_RDWR container_id=host container_name=host)
Thank you for deploying the additional output fields 🙏 !
@LucaGuerra and @FedeDP given that pathname=/etc/sssd/cf_lock.lmdb.lock
is correct (using the same new custom path concatenation method) something must be wonky wrt to how we store the cwd to then concat the path for fd.name
since for the fs.path
filters we fetch the args directly. Or something else in the fd.
filtercheck logic could be wonky.
@petterreinholdtsen this needs a patch in Falco's libs, ETA unknown and we should consider releasing a patch release. We will keep you updated. @falcosecurity/falco-maintainers @leogr
Given your examples above https://github.com/falcosecurity/libs/issues/1892 effectively the derivation of fd.name
is more sophisticated and perhaps fs.path.name
is actually wrong here. fs.path.name
always concatenates the proc.cwd
and the raw open* syscall arg wrt to opening a file, but it may be inaccurate. In addition, assuming cf_lock.lmdb.lock
in both logs you shared are at the same location you see that the full path is different when looking at fs.path.name
.
CC @FedeDP who shared a similar assessment. In summary, I am not sure if there is something to fix easily. At the same time if fs.path.name
is wrong we should do something about that and try to make them equal?
@petterreinholdtsen did you observe a regression between release or is this just a new observation? Furthermore, could you check for example what the true location of cf_lock.lmdb.lock
is on your system? Percentage wise how rare is the observation of <UNKNOWN>
or //
at the beginning?
Code in question:
More references:
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
[Melissa Kilby]
@petterreinholdtsen did you observe a regression between release or is this just a new observation? Furthermore, could you check for example what the true location of
cf_lock.lmdb.lock
is on your system? Percentage wise how rare is the observation of<UNKNOWN>
or//
at the beginning?
I just noticed in during testing. Do not know how often it happen.
-- Happy hacking Petter Reinholdtsen
/remove-lifecycle stale
This issue should be fixed in libs. I'm transferring this issue to the libs repo.
cc @FedeDP @Andreagit97
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
I'm still waiting for a resolution.
-- Happy hacking Petter Reinholdtsen
/remove-lifecycle stale
Sorry we had no time to look into this during this release cycle; setting next milestone: /milestone 0.19.0
/milestone 0.20.0
Some times the events from sandbox rule "Write below root" contain a incomplete path, making it hard to know the location of the file in question. This is an example message:
"14:27:16.402853438: Error File below / or /root opened for writing (file=\<UNKNOWN>cf_lock.lmdb.lock evt_type=openat user=root user_uid=0 user_loginuid=-1 process=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O_RDWR container_id=host container_name=host)",
The process triggering the event is cfengine with its lock file, and in other similar events the fd.name value is "//cf_lock.lmdb.lock", so I suspect the real value of the \<UNKNOWN> part should be "/".
This is running on a RHEL 9.3 machine using the "modern" ebpf probe, with a setup where all rule sets from https://github.com/falcosecurity/rules/ are enabled. The machine uses cfengine to keep the setup correct and up to date.
I expected fd.name to contain a full path to the opened file, and suspect there is a bug somewhere causing the \<UNKNOWN> part.
macro: allowed_tmp_files ^
LOAD_UNUSED_MACRO (Unused macro): Macro not referred to by any other rule/macro
Tue Feb 27 14:36:59 2024: Loading rules from file /etc/falco/rules.d/99-settings.yaml Tue Feb 27 14:36:59 2024: /etc/falco/rules.d/99-settings.yaml: Ok, with warnings 1 Warnings: In rules content: (/etc/falco/rules.d/60-nidhogg.yaml:0:0) macro 'allowed_tmp_files': (/etc/falco/rules.d/60-nidhogg.yaml:41:2)
macro: allowed_tmp_files ^
LOAD_UNUSED_MACRO (Unused macro): Macro not referred to by any other rule/macro
{ "machine": "x86_64", "nodename": "soria.uio.no", "release": "5.14.0-284.11.1.el9_2.x86_64", "sysname": "Linux", "version": "#1 SMP PREEMPT_DYNAMIC Wed Apr 12 10:45:03 EDT 2023" }