Closed plasticine closed 1 year ago
Thanks for reporting @plasticine ! I'll try to investigate this.
Hey there @fntlnz — just wanted to check in and see if you managed to uncover anything here? We’re still seeing this issue, though I’ll try with the recently released 0.23.X
.
Hi @plasticine Thanks for the detailed issue! - @leodido and I debugged this for quite some time on GKE 1.16.8-gke.15
and the cos_containerd
node image - We're deploying Falco as described here https://github.com/falcosecurity/contrib/tree/master/deploy/kubernetes/kernel-and-k8s-audit with the FALCO_BPF_PROBE
variable enabled.
After Falco is deployed and running, we deployed the event generator as it follows:
kubectl run unsecure-falco-example --image falcosecurity/event-generator@sha256:fd2c6c80854e1ee894f8905f8e05fbd4059c6ce401434503801110f549b7d595 -- run
Here is a sample of some events we had:
10:23:02.135632174: Error Package management process launched in container (user=root command=apk container_id=b1ca96994aee container_name=unsecure-falco-example image=docker.io/falcosecurity/event-generator:latest) k8s.ns=default k8s.pod=unsecure-falco-example container=b1ca96994aee k8s.ns=default k8s.pod=unsecure-falco-example container=b1ca96994aee k8s.ns=default k8s.pod=unsecure-falco-example container=b1ca96994aee
10:16:03.381277704: Notice Unexpected setuid call by non-sudo, non-root program (user=bin cur_uid=2 parent=child command=child --loglevel info run ^syscall.NonSudoSetuid$ uid=root container_id=b5991414d3c5 image=docker.io/falcosecurity/event-generator) k8s.ns=default k8s.pod=unsecure-falco-example container=b5991414d3c5 k8s.ns=default k8s.pod=unsecure-falco-example container=b5991414d3c5 k8s.ns=default k8s.pod=unsecure-falco-example container=b5991414d3c5
Can you give us more details on how the pods are deployed? If you can try our kubectl run
posted above and see if this happens with it it will be useful (be aware that the event-generator container triggers Falco a lot and it's not safe to run it in production environments)
Also, in our debugging session, we have containerd version:
github.com/containerd/containerd 1.2.8 a4bc1d432a2c33aa2eed37f338dceabb93641310
Could you report the version you have in you setup?
Maybe this helps to pin-point things further, I'm seeing the same as @plasticine without BPF enabled on my env (falco 0.23.0): { "machine": "x86_64", "nodename": "falco-2jd7j", "release": "4.4.0-177-generic", "sysname": "Linux", "version": "#207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020" } it's an IBM Cloud IKS 1.15
Hrm! 🤔 Hoping to get some time to circle back on this next week and see if I can get a more detailed repo
On 29 May 2020, at 10:37 pm, Daniel Pittner notifications@github.com wrote:
Maybe this helps to pin-point things further, I'm seeing the same as @plasticine without BPF enabled on my env (falco 0.23.0): { "machine": "x86_64", "nodename": "falco-2jd7j", "release": "4.4.0-177-generic", "sysname": "Linux", "version": "#207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020" } it's an IBM Cloud IKS 1.15
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Issues labeled "cncf", "roadmap" and "help wanted" will not be automatically closed. Please refer to a maintainer to get such label added if you think this should be kept open.
@fntlnz: Closing this issue.
Oh no i didn't want to close this!
/reopen
@fntlnz: Reopened this issue.
@fntlnz Sorry 🙈 I’ve been meaning to try and get back to this for ages, I’ll try and make time this week! I’ve been wondering if there is something about our gke k8s that is tripping things up; it’s a stretch, but I’m wondering if binary auth is possibly in the mix, as that’s something we leverage heavily.
Thanks @plasticine - no need to be sorry, time flies! Thanks for your help!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Issues labeled "cncf", "roadmap" and "help wanted" will not be automatically closed. Please refer to a maintainer to get such label added if you think this should be kept open.
Is this problem still present in Falco 0.25.0 ?
Looks like this issue still exists on 0.26.2, just tested it now with the exact configuration that @plasticine has.
Are there any specific things I can provide to make debugging this easier? From our configuration that is.
@plasticine and I have a suspicion that this is related to having binary authorization enabled on our GKE clusters. https://cloud.google.com/binary-authorization
@fntlnz was binary auth enabled when you tested this with your GKE configuration?
Hey @lucasteligioridis
Could you share the detailed steps to reproduce the problem?
Thanks in advance!
@leogr We literally just create a new GKE cluster now running 1.17 with Binary authorization enabled and then deploy the falco workload.
You should then be able to replicate the issue as per the original post.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Provide feedback via https://github.com/falcosecurity/community. /close
@poiana: Closing this issue.
I'm also seeing this behavior on Azure AKS v1.19.7 however I've seen this only from the Nginx Ingress Controller so far. Other containers seems to work fine and output the %container.image.repository
and %container.image.tag
correctly.
I tried to create a reproducable scenario on minikube but was unable to. It worked fine.
Falco info
* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.28.1, driver version=5c0b863ddade7a45568c0ac97d037422c9efb750
* Running falco-driver-loader with: driver=module, compile=yes, download=yes
* Unloading falco module, if present
* Trying to load a system falco module, if present
* Success: falco module found and loaded with modprob
Node info
{
"architecture": "amd64",
"bootID": "",
"containerRuntimeVersion": "containerd://1.5.0-beta.git31a0f92df+azure",
"kernelVersion": "5.4.0-1043-azure",
"kubeProxyVersion": "v1.19.7",
"kubeletVersion": "v1.19.7",
"machineID": "",
"operatingSystem": "linux",
"osImage": "Ubuntu 18.04.5 LTS",
"systemUUID": ""
}
Example alert
14:50:35.363121296: Notice Unexpected connection to K8s API Server from container (command=nginx-ingress-c --publish-service=test/lb-ingress-nginx-controller --election-id=ingress-controller-leader --ingress-class=nginx --configmap=test/lb-ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key --default-ssl-certificate=[masked] k8s.ns=test k8s.pod=lb-ingress-nginx-controller-597d69f489-gkt42 container=93b73c8ba8be image=sha256:0975b5aefeaca5f8398cf4c591b2e0024184839e3bf780e843b0c17ecd7a85e6 connection=10.0.40.30:54132->10.250.0.1:443) k8s.ns=test k8s.pod=lb-ingress-nginx-controller-597d69f489-gkt42 container=93b73c8ba8be k8s.ns=test k8s.pod=lb-ingress-nginx-controller-597d69f489-gkt42 container=93b73c8ba8be
Nginx Ingress Controller deployed like so
resource "helm_release" "ingress-controller-blue" {
count = 1
name = "lb"
repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx"
version = "3.27.0"
namespace = "test"
create_namespace = true
}
/reopen
@ejderdal: You can't reopen an issue/PR unless you authored it or you are a collaborator.
The issue seems to be still present: https://kubernetes.slack.com/archives/CMWH3EH32/p1681661535622169
Hi @plasticine :wave:
We patched the container engine in this regard a bit here https://github.com/falcosecurity/libs/pull/771/files.
Also sometimes see sha256
as container.image.repository
, can help :eyes: into it, had it on my list already.
Some of our images also uses the format <registry>/<repository>/<image>:<tag>@sha256:<digest>
just for more context for testing 😊
Amazing ty will start looking into it next week after KubeCon and will ping you on slack as well :pray: !
@sigurdfalk tagged you in the PR. Added backup lookups ... after that I wouldn't know where else to extract the image from, searched the entire container status response. It certainly isn't a Falco bug, sometimes it simply just is sha256
. I queried Kubernetes audit logs to confirm this. What I don't know however is if in such corner cases the image from the annotations would also just be sha256
. In that case it would be game over.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Provide feedback via https://github.com/falcosecurity/community. /close
@poiana: Closing this issue.
@incertum @FedeDP
My understanding is that this issue has been fixed in 0.35.0. Is that correct?
Yep@ melissa fixed that if i remember correctly!
Correct, @gnosek PR had the biggest impact (https://github.com/falcosecurity/libs/pull/771), but to increase robustness even more I added backup lookups (https://github.com/falcosecurity/libs/pull/1067) for the Kubernetes cases (cri, containerd and cri-o).
In summary we now try to look up the container image from all possible places in the container status response, especially for the Kubernetes use case.
We can mark this as completed for 0.35.0 and should there still be issues, we can continue working on it.
/milestone 0.35.0
Hey there — thanks very much for Falco, it’s an amazing bit of software! 👋
So, we almost exclusively deploy all our images by sha256 digest, as opposed to by tag, and when attempting to update Falco from
0.17.0
to0.22.1
on a bunch of our k8s clusters we’ve observed that events all seem to have the followingcontainer.image.repository
, andcontainer.image.tag
values;How to reproduce it
I’ve validated this behavior on GKE nodes running
1.15.x
on top of COS with Containerd, and have re-deployed our falco install from scratch using https://github.com/falcosecurity/falco/tree/master/integrations/k8s-using-daemonset/k8s-with-rbacExpected behaviour
I would expect that the
repository
field would contain the actual image repository, and thetag
field either the tag, or digest....or maybe even better, a new field
digest
in the event thattag
is null and the image is being referenced by digest;Environment
Falco version:
System info:
Cloud provider or hardware configuration: GKE, running
1.15.x
on COS with ContainerdInstallation method: https://github.com/falcosecurity/falco/tree/master/integrations/k8s-using-daemonset/k8s-with-rbac