falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.3k stars 897 forks source link

Change the format of the 'output' field in the JSON payload #2985

Closed Issif closed 3 weeks ago

Issif commented 9 months ago

Motivation

Right now, the default output for Falco is stdout with basic text as format. The generated log lines follow this pattern <timestamp> <Priority> <output>:

14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)

By choosing the JSON format we get the same content in the output field, with same elements timestamp and Priority, but these elements are also contained in specific fields of the JSON:

{
    "output": "14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)",
    "priority": "WARNING",
    "rule": "PTRACE attached to process",
    "time": "2023-12-20T14:37:27.505989596Z",
    "output_fields": {
        "container.info": "container.info",
        "evt.arg.flags": "evt.arg.flags",
        "evt.type": "evt.type",
        "proc.cmdline": "proc.cmdline",
        "proc.exepath": "proc.exepath",
        "proc.name": "proc.name",
        "proc.pcmdline": "proc.pcmdline",
        "proc.pname": "proc.pname",
        "proc.tty": "proc.tty",
        "user.loginuid": "user.loginuid",
        "user.name": "user.name",
        "user.uid": "user.uid"
    },
    "hostname": "host-7.local",
    "source": "syscalls",
    "tags": [
        "maturity_stable",
        "host",
        "container",
        "process",
        "mitre_privilege_escalation",
        "T1055.008"
    ]
}

It implies we have a duplication of the information between the fields, and it creates a mess for systems trying to deduplicate alerts. See this issue for Falcosidekick.

Feature

I propose to allow an opt-in to remove the timestamp and/or the priority from the outpout when the JSON format is used.

Alternatives

I created a PR for Falcosidekick allowing to reformat the output field. See PR#729

Additional context

incertum commented 9 months ago

Someone else brought that up a while ago, had started something here https://github.com/falcosecurity/falco/pull/2670/files maybe let's discuss? I can see a bit of a better cleanup even under the hood than that wip PR.

@falcosecurity/falco-maintainers

Andreagit97 commented 9 months ago

the proposed changes make sense! There is no reason to duplicate the timestamp information in the output, or at least nothing I can think about at the moment!

h4l0gen commented 9 months ago

Hey @Issif @incertum @Andreagit97 , i would like to work on this issue. Please guide me how can I start on this. Thanks!!

h4l0gen commented 8 months ago

/assign

leogr commented 8 months ago

allow an opt-in to remove the timestamp

:+1:

Andreagit97 commented 8 months ago

@h4l0gen thank you for tackling this! As you correctly highlighted here https://kubernetes.slack.com/archives/CMWH3EH32/p1705578359716419?thread_ts=1705577834.276719&cid=CMWH3EH32 the issue is this line https://github.com/falcosecurity/falco/blob/f4aef006fedb7225d8906fa37f584223ef29cbcc/userspace/engine/formats.cpp#L47

there is no need to change the tostring_withformat function, probably switching the formatter from OF_NORMAL to OF_JSON should be enough

diff --git a/userspace/engine/formats.cpp b/userspace/engine/formats.cpp
index 106a58b7..c6bd07c5 100644
--- a/userspace/engine/formats.cpp
+++ b/userspace/engine/formats.cpp
@@ -43,9 +43,6 @@ std::string falco_formats::format_event(gen_event *evt, const std::string &rule,

        formatter = m_falco_engine->create_formatter(source, format);

-       // Format the original output string, regardless of output format
-       formatter->tostring_withformat(evt, line, gen_event_formatter::OF_NORMAL);
-
        if(formatter->get_output_format() == gen_event_formatter::OF_JSON)
        {
                std::string json_line;
@@ -89,6 +86,7 @@ std::string falco_formats::format_event(gen_event *evt, const std::string &rule,
                if(m_json_include_output_property)
                {
                        // This is the filled-in output line.
+                       formatter->tostring_withformat(evt, line, gen_event_formatter::OF_JSON);
                        event["output"] = line;
                }

@@ -127,6 +125,11 @@ std::string falco_formats::format_event(gen_event *evt, const std::string &rule,
                full_line.append("}");
                line = full_line;
        }
+       else
+       {
+               // Obtain `line` in the normal format
+               formatter->tostring_withformat(evt, line, gen_event_formatter::OF_NORMAL);
+       }

        return line.c_str();
 }

BTW just to answer your question on slack:

if i can found the definition of gen_event or any other function which is generating output then i can modify its definition to solve issue but I am not able to locate it. Please assist me with this, and also confirm if this approach would be effective

The function tostring_withformat is defined in libsinsp and more precisely here https://github.com/falcosecurity/libs/blob/6d57bdec7d3f0ed898c24682c7ac417a5e3d6294/userspace/libsinsp/eventformatter.cpp#L250 Just for context: libsinsp is a library that Falco uses for many things (all the host capture + filter checks + other stuff)

Andreagit97 commented 8 months ago

BTW I propose to switch from

{
    "output": "14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)",
    "priority": "WARNING",
    "rule": "PTRACE attached to process",
    "time": "2023-12-20T14:37:27.505989596Z",
    "output_fields": {
        "container.info": "container.info",
        "evt.arg.flags": "evt.arg.flags",
        "evt.type": "evt.type",
        "proc.cmdline": "proc.cmdline",
        "proc.exepath": "proc.exepath",
        "proc.name": "proc.name",
        "proc.pcmdline": "proc.pcmdline",
        "proc.pname": "proc.pname",
        "proc.tty": "proc.tty",
        "user.loginuid": "user.loginuid",
        "user.name": "user.name",
        "user.uid": "user.uid"
    },
    "hostname": "host-7.local",
    "source": "syscalls",
    "tags": [
        "maturity_stable",
        "host",
        "container",
        "process",
        "mitre_privilege_escalation",
        "T1055.008"
    ]
}

to

{
    "output": "(proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)",
    "priority": "WARNING",
    "rule": "PTRACE attached to process",
    "time": "2023-12-20T14:37:27.505989596Z",
    "output_fields": {
        "container.info": "container.info",
        "evt.arg.flags": "evt.arg.flags",
        "evt.type": "evt.type",
        "proc.cmdline": "proc.cmdline",
        "proc.exepath": "proc.exepath",
        "proc.name": "proc.name",
        "proc.pcmdline": "proc.pcmdline",
        "proc.pname": "proc.pname",
        "proc.tty": "proc.tty",
        "user.loginuid": "user.loginuid",
        "user.name": "user.name",
        "user.uid": "user.uid"
    },
    "hostname": "host-7.local",
    "source": "syscalls",
    "tags": [
        "maturity_stable",
        "host",
        "container",
        "process",
        "mitre_privilege_escalation",
        "T1055.008"
    ]
}

@Issif @falcosecurity/falco-maintainers So removing from output the 3 fields: time, priority, rule -> remove (14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt)

If we agree on this the proposed patch https://github.com/falcosecurity/falco/issues/2985#issuecomment-1898506966 should be enough

Issif commented 8 months ago

It's good to me.

FYI, our example is not 100% accurate, the rule is not really part of the output for that specific rule, it's just a part of the configured output:

Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty %container.info)

See: https://github.com/falcosecurity/rules/blob/main/rules/falco_rules.yaml#L1105,L1117

Andreagit97 commented 8 months ago

Oh you are right rule is PTRACE attached to process while the output is Detected ptrace PTRACE_ATTACH attempt! BTW yes i think that having the rule tag in the json object should be enough even if it could be different from the rule description in the output

poiana commented 5 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

h4l0gen commented 5 months ago

/remove-lifecycle stale

leogr commented 5 months ago

Any updates on this? Is there any blocker? :thinking:

Andreagit97 commented 5 months ago

The blocker is this https://github.com/falcosecurity/falco/pull/3037#issuecomment-2033727092, we need to decide what to do:

Issif commented 5 months ago

I vote for option 2. This was my initial proposal after all. Correct me if I'm wrong, by the rule is not contained in the output, just the timestamp and the priority.

Andreagit97 commented 5 months ago

Yep we don't have the exact content of the rule tag but we have something very similar:

    "output": "14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt ...",
    "rule": "PTRACE attached to process",

So Detected ptrace PTRACE_ATTACH attempt vs PTRACE attached to process. We can also keep Detected ptrace PTRACE_ATTACH attempt in the output even if it's more or less a duplicate of the rule tag

Issif commented 5 months ago

This is a particular case, the output is set to display that message that's all. See https://github.com/falcosecurity/rules/blob/e65f2518b06b6e439b49451a0738115f74d224e0/rules/falco_rules.yaml#L1099,L1111

h4l0gen commented 5 months ago

Hi @Andreagit97, @Issif, and @leogr. Since we've encountered such a significant change in Falco, I'll wait for your decision on this PR before making any further changes. While I don't have a strong opinion on this at the moment, though as @Issif mentioned, the second option looks good to me for now

leogr commented 5 months ago

I still have to double-check, but I believe the 2nd option is ok.

cc @falcosecurity/falco-maintainers wdyt?

Issif commented 4 months ago

Is this feature will be integrated after the 0.38?

FedeDP commented 4 months ago

Nope, it is in TBD milestone since we could not reach an agreement.

LucaGuerra commented 2 months ago

Hey folks, I would like to see this in 0.39.0 since it's being discussed these days. I'd like to share my opinion on the matter about a potential solution:

From what I can see the main issue is that modifying the output value would introduce breaking changes (in Falco and potentially in libsinsp as well), and I think it's a very valid point. So, let's not do that. Since we have priority, rule and time instead of removing data let's add a new field (can call it message if we don't have another idea) that contains the pure rule output without timestamps/priority/rule name. The main advantages are that this won't introduce breaking changes and it is intuitive to use when parsing the log with an aggregator; disadvantage is that the json payload will be a bit bigger because it duplicates the output but I think advantages outweigh disadvantages.

WDYT?

Issif commented 2 months ago

Hey folks, I would like to see this in 0.39.0 since it's being discussed these days. I'd like to share my opinion on the matter about a potential solution:

From what I can see the main issue is that modifying the output value would introduce breaking changes (in Falco and potentially in libsinsp as well), and I think it's a very valid point. So, let's not do that. Since we have priority, rule and time instead of removing data let's add a new field (can call it message if we don't have another idea) that contains the pure rule output without timestamps/priority/rule name. The main advantages are that this won't introduce breaking changes and it is intuitive to use when parsing the log with an aggregator; disadvantage is that the json payload will be a bit bigger because it duplicates the output but I think advantages outweigh disadvantages.

WDYT?

Another issue with this solution is all tools ingesting the Falco event will have to adapt (falcosidekick, talon, etc, ...) to get the benefit

LucaGuerra commented 2 months ago

Another issue with this solution is all tools ingesting the Falco event will have to adapt (falcosidekick, talon, etc, ...) to get the benefit

Makes sense. On the flip side, if we need to display a message with timestamp and severity on Falcosidekick or Talon I guess we'd still need to adapt them because depending on the underlying Falco version (e.g. < 0.39.0 vs >= 0.39.0) they will need to pick different json fields so I believe it wouldn't be seamless either.

leogr commented 2 months ago

I like the idea of adding another field (e.g., message). But at this point, we should also introduce the ability to remove output from the JSON output. I think it makes totally sense to target this for 0.39.

Ultimately, all these should be opt-in settings for the JSON output. We might want to allow even more customization (i.e., choose which fields should be included and which should not).

LucaGuerra commented 3 weeks ago

I like the idea of adding another field (e.g., message). But at this point, we should also introduce the ability to remove output from the JSON output. I think it makes totally sense to target this for 0.39.

That's exactly how it will work for Falco 0.39.0