Enrich events with cloud metadata when running in a cloud native environment (GKE, EKS, AKS, etc)

abroglesc commented 3 years ago

Motivation If you have Falco deployed to many clusters across different AWS accounts or Google Cloud Projects it can be challenging to understand what Account/Project, Region, and Cluster this specific alert triggered on. This data is easily available via the instance metadata services in both EKS and GKE so it likely wouldn't be too difficult to dynamically enrich Falco events with this information.

Feature A new Falco configuration flag that allows you to configure type of cluster (e.g. EKS, GKE, or AKS) and upon startup of the Falco daemon will make API calls to the instance metadata service for the following info:

Account ID / Project ID
Cluster Name
Region
AvailabilityZone
Node/Instance Name

Then allow these new pieces of metadata to be enriched on events and used in rules and outputs (https://falco.org/docs/rules/supported-fields/)

Alternatives This could somewhat be done within falcosidekick but you lose out on the ability to enrich node/instanceId information since falcosidekick doesn't need to run on every node like the Falco daemonset does. The approach of handling this in falcosidekick would make it so that if there were events on the node level (%container.id='host') we don't actually know what exact node these events came from and thus what we should be potentially performing forensics on.

Additional context GKE Endpoints: You need to invoke requests with a request header: `Metadata-Flavor: Google`	Metadata	URL
project_id	http://metadata.google.internal/computeMetadata/v1/project/project-id
zone	http://metadata.google.internal/computeMetadata/v1/instance/zone
cluster_name	http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-name
instance_name	http://metadata.google.internal/computeMetadata/v1/instance/name

EKS Endpoints: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

AKS Endpoints: I haven't used AKS or Azure but it appears their documentation for the metadata service is here: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service?tabs=linux

Kaizhe commented 3 years ago

Just a note: we may have to break it up into multiple tickets.

poiana commented 3 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 2 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

poiana commented 2 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 2 years ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1704#issuecomment-1007922732): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

jasondellaluce commented 2 years ago

/reopen

poiana commented 2 years ago

@jasondellaluce: Reopened this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1704#issuecomment-1008700802): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

mmoyerfigma commented 2 years ago

I'd be thrilled if this also included ECS task metadata (task ID, task definition name/version, etc.) from the ECS Introspection API, which maps from a container ID to this data. These fields are analogous to the k8s.pod.id/k8s.deployment.name fields included when running with -pk.

jasondellaluce commented 2 years ago

Hey @abroglesc, @mmoyerfigma!

Yesterday we released Falco 0.31.0 that beings support to the new plugin system. What you both described here is a perfect fit for an extractor plugin and could even be written in Go with few lines of code. Example 👉🏼 https://github.com/falcosecurity/plugin-sdk-go/blob/main/examples/extractor/extractor.go

What do you think about working together to implement this? I can help you getting started with the plugin development!

mmoyerfigma commented 2 years ago

What do you think about working together to implement this? I can help you getting started with the plugin development!

I may take a look at this. I ended up writing a post-processor that runs via program_output and interacts with that ECS API. Should be easy to refactor that code into an extractor plugin, I think.

mmoyerfigma commented 2 years ago

I started looking into this plugin interface, but I'm worried it's not suitable for my use case unless I'm misunderstanding something. I can write an extractor plugin that makes fields like ecs.task_id or ecs.task_definition available in rules, but since none of the default rules use those keys, my ECS metadata won't show up in most alerts.

I could fork all the rules and add %ecs.task_id to each of them, but what I really want is an extension point that replaces the special %container.info handling, so it gets appended by default to every rule.

jasondellaluce commented 2 years ago

If I understand correctly, I think what you're looking for is the -p Falco option:

 -p <output_format>, --print <output_format>
                               Add additional information to each falco notification's output.
                               With -pc or -pcontainer will use a container-friendly format.
                               With -pk or -pkubernetes will use a kubernetes-friendly format.
                               With -pm or -pmesos will use a mesos-friendly format.
                               Additionally, specifying -pc/-pk/-pm will change the interpretation
                               of %container.info in rule output fields.

It does not just limit to -pk. With that, you would be able to append arbitrary formats to every rule output, and include fields like ecs.task_id. Even if you can't customize the container.info replacement, that's still handy. More here 👉🏼 https://falco.org/docs/alerts/formatting/

Besides, I think working on a plugin like this would be a valuable addition to the project and the ecosystem.

mmoyerfigma commented 2 years ago

Yeah, -pc is what I'm using today, and the functionality I'd want to build in a plugin is like a new -pe (ECS), but I don't see how that's possible to do in the current plugin API, since it doesn't fit the pattern of a source plugin or an extractor plugin.

jasondellaluce commented 2 years ago

We can investigate better customization capabilities in the future, but for now instead of having a falco -pe you would have a falco -p"taskid=%ecs.task_id, taskdef=%ecs.task_definition, ...", which will append that formatted output to every rule alert and use your plugin to extract the info of each field.

So basically having an extractor plugin implementing the new fields, and then running falco -p"..." should cover this use case.

poiana commented 2 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 2 years ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1704#issuecomment-1059576284): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

jasondellaluce commented 2 years ago

/reopen /remove-lifecycle rotten

poiana commented 2 years ago

@jasondellaluce: Reopened this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1704#issuecomment-1059714891): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

poiana commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 2 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

jasondellaluce commented 2 years ago

/remove-lifecycle rotten

poiana commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 2 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

poiana commented 1 year ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 1 year ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1704#issuecomment-1334476305): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

jasondellaluce commented 1 year ago

/remove-lifecycle rotten /reopen

poiana commented 1 year ago

@jasondellaluce: Reopened this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1704#issuecomment-1467928285): >/remove-lifecycle rotten >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

jasondellaluce commented 1 year ago

/remove-lifecycle stale

This should now be possible due to the newest features of the plugin framework.

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 11 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Andreagit97 commented 11 months ago

/remove-lifecycle rotten

poiana commented 8 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 8 months ago

/remove-lifecycle stale

poiana commented 5 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 4 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Andreagit97 commented 3 months ago

/remove-lifecycle rotten

poiana commented 4 weeks ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

falcosecurity / falco

Enrich events with cloud metadata when running in a cloud native environment (GKE, EKS, AKS, etc) #1704