falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.23k stars 893 forks source link

Incorrect event values for container.image.* when using image digests #1187

Closed plasticine closed 1 year ago

plasticine commented 4 years ago

Hey there — thanks very much for Falco, it’s an amazing bit of software! 👋

So, we almost exclusively deploy all our images by sha256 digest, as opposed to by tag, and when attempting to update Falco from 0.17.0 to 0.22.1 on a bunch of our k8s clusters we’ve observed that events all seem to have the following container.image.repository, and container.image.tag values;

"container.image.repository":"sha256","container.image.tag":"[DIGEST]"

How to reproduce it

I’ve validated this behavior on GKE nodes running 1.15.x on top of COS with Containerd, and have re-deployed our falco install from scratch using https://github.com/falcosecurity/falco/tree/master/integrations/k8s-using-daemonset/k8s-with-rbac

Expected behaviour

I would expect that the repository field would contain the actual image repository, and the tag field either the tag, or digest.

"container.image.repository":"some-image","container.image.tag":"sha256:[DIGEST]"

...or maybe even better, a new field digest in the event that tag is null and the image is being referenced by digest;

"container.image.repository":"some-image","container.image.tag":null,"container.image.digest":"sha256:[DIGEST]"

Environment

fntlnz commented 4 years ago

Thanks for reporting @plasticine ! I'll try to investigate this.

fntlnz commented 4 years ago

Looking here :eyes: https://github.com/draios/sysdig/blob/a2567ad80eeae8afb347b16b6580398f669c9ea2/userspace/libsinsp/filterchecks.cpp#L6154-L6198

plasticine commented 4 years ago

Hey there @fntlnz — just wanted to check in and see if you managed to uncover anything here? We’re still seeing this issue, though I’ll try with the recently released 0.23.X.

fntlnz commented 4 years ago

Hi @plasticine Thanks for the detailed issue! - @leodido and I debugged this for quite some time on GKE 1.16.8-gke.15 and the cos_containerd node image - We're deploying Falco as described here https://github.com/falcosecurity/contrib/tree/master/deploy/kubernetes/kernel-and-k8s-audit with the FALCO_BPF_PROBE variable enabled.

After Falco is deployed and running, we deployed the event generator as it follows:

kubectl run unsecure-falco-example --image falcosecurity/event-generator@sha256:fd2c6c80854e1ee894f8905f8e05fbd4059c6ce401434503801110f549b7d595 -- run

Here is a sample of some events we had:

10:23:02.135632174: Error Package management process launched in container (user=root command=apk container_id=b1ca96994aee container_name=unsecure-falco-example image=docker.io/falcosecurity/event-generator:latest) k8s.ns=default k8s.pod=unsecure-falco-example container=b1ca96994aee k8s.ns=default k8s.pod=unsecure-falco-example container=b1ca96994aee k8s.ns=default k8s.pod=unsecure-falco-example container=b1ca96994aee
10:16:03.381277704: Notice Unexpected setuid call by non-sudo, non-root program (user=bin cur_uid=2 parent=child command=child --loglevel info run ^syscall.NonSudoSetuid$ uid=root container_id=b5991414d3c5 image=docker.io/falcosecurity/event-generator) k8s.ns=default k8s.pod=unsecure-falco-example container=b5991414d3c5 k8s.ns=default k8s.pod=unsecure-falco-example container=b5991414d3c5 k8s.ns=default k8s.pod=unsecure-falco-example container=b5991414d3c5

Can you give us more details on how the pods are deployed? If you can try our kubectl run posted above and see if this happens with it it will be useful (be aware that the event-generator container triggers Falco a lot and it's not safe to run it in production environments)

leodido commented 4 years ago

Also, in our debugging session, we have containerd version:

github.com/containerd/containerd 1.2.8 a4bc1d432a2c33aa2eed37f338dceabb93641310

Could you report the version you have in you setup?

dpittner commented 4 years ago

Maybe this helps to pin-point things further, I'm seeing the same as @plasticine without BPF enabled on my env (falco 0.23.0): { "machine": "x86_64", "nodename": "falco-2jd7j", "release": "4.4.0-177-generic", "sysname": "Linux", "version": "#207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020" } it's an IBM Cloud IKS 1.15

plasticine commented 4 years ago

Hrm! 🤔 Hoping to get some time to circle back on this next week and see if I can get a more detailed repo

On 29 May 2020, at 10:37 pm, Daniel Pittner notifications@github.com wrote:

 Maybe this helps to pin-point things further, I'm seeing the same as @plasticine without BPF enabled on my env (falco 0.23.0): { "machine": "x86_64", "nodename": "falco-2jd7j", "release": "4.4.0-177-generic", "sysname": "Linux", "version": "#207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020" } it's an IBM Cloud IKS 1.15

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Issues labeled "cncf", "roadmap" and "help wanted" will not be automatically closed. Please refer to a maintainer to get such label added if you think this should be kept open.

poiana commented 4 years ago

@fntlnz: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1187#issuecomment-665547625): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
fntlnz commented 4 years ago

Oh no i didn't want to close this!

/reopen

poiana commented 4 years ago

@fntlnz: Reopened this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1187#issuecomment-665548131): >Oh no i didn't want to close this! > >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
plasticine commented 4 years ago

@fntlnz Sorry 🙈 I’ve been meaning to try and get back to this for ages, I’ll try and make time this week! I’ve been wondering if there is something about our gke k8s that is tripping things up; it’s a stretch, but I’m wondering if binary auth is possibly in the mix, as that’s something we leverage heavily.

fntlnz commented 4 years ago

Thanks @plasticine - no need to be sorry, time flies! Thanks for your help!

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Issues labeled "cncf", "roadmap" and "help wanted" will not be automatically closed. Please refer to a maintainer to get such label added if you think this should be kept open.

leogr commented 3 years ago

Is this problem still present in Falco 0.25.0 ?

lucasteligioridis commented 3 years ago

Looks like this issue still exists on 0.26.2, just tested it now with the exact configuration that @plasticine has.

lucasteligioridis commented 3 years ago

Are there any specific things I can provide to make debugging this easier? From our configuration that is.

lucasteligioridis commented 3 years ago

@plasticine and I have a suspicion that this is related to having binary authorization enabled on our GKE clusters. https://cloud.google.com/binary-authorization

@fntlnz was binary auth enabled when you tested this with your GKE configuration?

leogr commented 3 years ago

Hey @lucasteligioridis

Could you share the detailed steps to reproduce the problem?

Thanks in advance!

lucasteligioridis commented 3 years ago

@leogr We literally just create a new GKE cluster now running 1.17 with Binary authorization enabled and then deploy the falco workload.

You should then be able to replicate the issue as per the original post.

poiana commented 3 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 3 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

poiana commented 3 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 3 years ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1187#issuecomment-826160784): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sebeliasson commented 3 years ago

I'm also seeing this behavior on Azure AKS v1.19.7 however I've seen this only from the Nginx Ingress Controller so far. Other containers seems to work fine and output the %container.image.repository and %container.image.tag correctly.

I tried to create a reproducable scenario on minikube but was unable to. It worked fine.

Falco info

* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.28.1, driver version=5c0b863ddade7a45568c0ac97d037422c9efb750
* Running falco-driver-loader with: driver=module, compile=yes, download=yes
* Unloading falco module, if present
* Trying to load a system falco module, if present
* Success: falco module found and loaded with modprob

Node info

{
  "architecture": "amd64",
  "bootID": "",
  "containerRuntimeVersion": "containerd://1.5.0-beta.git31a0f92df+azure",
  "kernelVersion": "5.4.0-1043-azure",
  "kubeProxyVersion": "v1.19.7",
  "kubeletVersion": "v1.19.7",
  "machineID": "",
  "operatingSystem": "linux",
  "osImage": "Ubuntu 18.04.5 LTS",
  "systemUUID": ""
}

Example alert

14:50:35.363121296: Notice Unexpected connection to K8s API Server from container (command=nginx-ingress-c --publish-service=test/lb-ingress-nginx-controller --election-id=ingress-controller-leader --ingress-class=nginx --configmap=test/lb-ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key --default-ssl-certificate=[masked] k8s.ns=test k8s.pod=lb-ingress-nginx-controller-597d69f489-gkt42 container=93b73c8ba8be image=sha256:0975b5aefeaca5f8398cf4c591b2e0024184839e3bf780e843b0c17ecd7a85e6 connection=10.0.40.30:54132->10.250.0.1:443) k8s.ns=test k8s.pod=lb-ingress-nginx-controller-597d69f489-gkt42 container=93b73c8ba8be k8s.ns=test k8s.pod=lb-ingress-nginx-controller-597d69f489-gkt42 container=93b73c8ba8be

Nginx Ingress Controller deployed like so

resource "helm_release" "ingress-controller-blue" {
    count             = 1
    name              = "lb"
    repository        = "https://kubernetes.github.io/ingress-nginx"
    chart             = "ingress-nginx"
    version           = "3.27.0"
    namespace         = "test"
    create_namespace  = true
}
sebeliasson commented 3 years ago

/reopen

poiana commented 3 years ago

@ejderdal: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/falcosecurity/falco/issues/1187#issuecomment-842917247): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Andreagit97 commented 1 year ago

The issue seems to be still present: https://kubernetes.slack.com/archives/CMWH3EH32/p1681661535622169

incertum commented 1 year ago

Hi @plasticine :wave:

We patched the container engine in this regard a bit here https://github.com/falcosecurity/libs/pull/771/files.

Also sometimes see sha256 as container.image.repository, can help :eyes: into it, had it on my list already.

sigurdfalk commented 1 year ago

Some of our images also uses the format <registry>/<repository>/<image>:<tag>@sha256:<digest> just for more context for testing 😊

incertum commented 1 year ago

Amazing ty will start looking into it next week after KubeCon and will ping you on slack as well :pray: !

incertum commented 1 year ago

@sigurdfalk tagged you in the PR. Added backup lookups ... after that I wouldn't know where else to extract the image from, searched the entire container status response. It certainly isn't a Falco bug, sometimes it simply just is sha256. I queried Kubernetes audit logs to confirm this. What I don't know however is if in such corner cases the image from the annotations would also just be sha256. In that case it would be game over.

poiana commented 1 year ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 1 year ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1187#issuecomment-1566360701): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
leogr commented 1 year ago

@incertum @FedeDP

My understanding is that this issue has been fixed in 0.35.0. Is that correct?

FedeDP commented 1 year ago

Yep@ melissa fixed that if i remember correctly!

incertum commented 1 year ago

Correct, @gnosek PR had the biggest impact (https://github.com/falcosecurity/libs/pull/771), but to increase robustness even more I added backup lookups (https://github.com/falcosecurity/libs/pull/1067) for the Kubernetes cases (cri, containerd and cri-o).

In summary we now try to look up the container image from all possible places in the container status response, especially for the Kubernetes use case.

We can mark this as completed for 0.35.0 and should there still be issues, we can continue working on it.

FedeDP commented 1 year ago

/milestone 0.35.0