Closed leogr closed 3 years ago
As said, I believe this approach is really beneficial and needed.
The really important point here is the last one about node name detection.
What if the user forgets to pass it down? Which behavior the libs will have in such a situation?
What if the user forgets to pass it down? Which behavior the libs will have in such a situation?
I believe node autodetection is not feasible since there're too many ways to use libs (also too many ways to deploy Falco), so it would be hard to cover all use cases reliably. On the other hand, explicitly passing that value via command args (for example, using the downward API like in the above code snipped) seemed to be a solid solution (at least, I can't see any issue with that).
Finally, to answer your question, when the node name is not specified, the behavior should be simply no filtering (as it is now). IMHO, documenting the performance penalty of not passing that value should be enough to make users aware of the implications.
@leodido wdyt?
cc @shane-lawrence @cashwilliams
It should be straightforward for us to pass the node name in an argument to Falco instead of autodetecting in libs, just like we currently pass the k8s API address and auth info. Defaulting to no filtering makes sense to me: doing nothing preserves the status quo, which is fine in smaller clusters; if you have a larger cluster, you could either pass a larger max. size or specify a node filter (or both).
Defaulting to the current behavior and documenting the performance improvements by using this future flag resonates a lot with me!
Motivation
This proposed feature aims to address the issue https://github.com/falcosecurity/falco/issues/778.
It is intended to be a further improvement alongside other two valid fixes/mitigations already proposed:
(and does not supersede them).
I also want to thank @leodido, @ldegio, and @zemek since my analysis is in some way a derivate of their early findings and proposed solutions.
Rationale
libsinsp
implements a mechanism to gather the K8s state, which is then used to enrich syscall events with metadata. During the initial phases, the whole K8s state is fetched from the K8s api-server. Later updates are received using thewatch
API.In a huge cluster, the problem arises when the K8s api-server response is too big to be handled by
libsinsp
. Usually, it occurs at startup time when the response of the following call exceeds the hard-coded limit of30MB
:Although https://github.com/falcosecurity/libs/pull/40 allows making the limit configurable (thus, users can overcome the
30MB
limitation), the performance issue remains unsolved.That being said, there are some considerations to take into account to tackle the problem:
libscap
only captures syscall events from the system in which it's running (for this reason, Falco needs to be deployed on each node of the cluster)libsinsp
requests the state of all pods in the cluster regardless of what is neededLuckily, the
/api/v1/pods
API allows to filter by node (eg., usingfieldSelector=spec.nodeName=<node>
) that give us room for a substantial improvement.Testing hypothesis
To test my hypothesis, I created a testing cluster to hit the 30MB limit, with the following parameters:
The results were:
/api/v1/pods?fieldSelector=status.phase!=Failed,status.phase!=Unknown,status.phase!=Succeeded&pretty=false
/api/v1/pods?fieldSelector=status.phase!=Failed,status.phase!=Unknown,status.phase!=Succeeded,spec.nodeName=kind-control-plane&pretty=false
It is now clear that the response size reduces significatively. It would not only avoid hitting the 30MB limit but also would substantially increase the overall performance. Moreover, that benefit would also apply during the watching phase.
Finally, considering those are the default control-plane limits specified by Kubernets for large clusters:
kubelet
default is 110 and can be changed, seemaxPods
in the KubeletConfiguration)We could also assume that:
Feature
Introduce the ability to filter by the current node when fetching pods metadata from the K8S api-server.
Also,
libsinsp
should expose an API to set the node name to be used in the filter and not trying to autodetect it sincelibs
implementers could use different deployment strategies that we cannot know in advance (I am open to feedback, see also the Additional context section below).Alternatives
As suggested privately by @zuc, an alternative could be to directly query the local
kubelet
to gather those metadata. Although it would be for sure a further performance improvement (since we would not pass thru the api-server), we concluded that it is not feasible because:kubelet
API is NOT officially stable (so our implementation could break at any time without notice)Additional context
Detecting node name in an automated fashion can break in some scenarios, therefore we should implement this change in a way that allows consumers (e.g. Falco) to set it explicitly and pass it down to the
libsinsp
API mentioned above.Setting the node name should also be optional since filtering by a single node might not apply to all use cases (ie. with
kind
more nodes can live on the same host).For example, in a Falco K8s deployment, we could make use of the downward API to pass current node name to Falco: