elastisys / compliantkubernetes-apps

Elastisys Compliant Kubernetes is an open source, Certified Kubernetes distribution designed according to the ISO27001 controls: providing you with security tooling and observability from day one.
https://elastisys.io/compliantkubernetes/
Apache License 2.0
45 stars 7 forks source link

[3] Investigate whether we can create a NetworkPolicy violation dashboard #165

Closed cristiklein closed 3 years ago

cristiklein commented 3 years ago

One requirement with a hardened environment is to monitor suspicious activity. Network traffic that is blocked is extremely suspicious and should generally be investigated. This task consists in investigating whether it is possible to create a dashboard with network traffic that was blocked by NetworkPolicy. Ideally this should include:

This could serve as inspiration: https://aws.amazon.com/blogs/security/how-to-optimize-and-visualize-your-security-groups/

Acceptance criteria:

cristiklein commented 3 years ago

As an overarching requirement for dashboards, they should not only highlight issues, but also make it quick to figure out how to solve the issues.

The "capacity management" dashboard is a very telling example. It shows "79 containers are missing resource requests", then, just one click away, lists the Pods which are missing requests.

viktor-f commented 3 years ago

Did some digging and this is what I found, please let me know I have missed something.

Network policies are implemented in calico by adding rules to iptables in each node. In order to get the metrics we want I think we would need to monitor all network traffic in the node and somehow map the blocked traffic to specific policies.

From what I have seen there is not any application that we have that monitors this by default (or has the ability to monitor this?). Calico produces some metrics, but nothing that relates to this https://docs.projectcalico.org/reference/felix/prometheus

It might be possible to get the metrics we want by using eBPF in some way. E.g. the tool Inspector gadget is using eBPF to (among other things) monitor traffic and suggest network policies based on the traffic it sees. I have not looked very thoroughly at that tool, but it does not look like it’s quite built to do the monitoring we want. Calico has also launched a second type of policy enforcement where it uses eBPF instead of iptables to configure networks. It was released in 2020 and seems to still be in development, but in the future that might give us some or all of the metrics that we would like. https://docs.projectcalico.org/maintenance/ebpf/enabling-bpf

Another possible way to get the metrics we want would be to start using a service mesh. Looking at Istio's documentation they seem to provide at least some of the metrics that we would want and they can provide quite extensive logging that we could turn into some metrics/graphs https://istio.io/latest/docs/concepts/observability/ To get some of these metrics we might want to use policies in the service mesh instead of the regular NetworkPolicies to regulate the traffic. Though I’m not sure that is needed.

Conclusion:

Xartos commented 3 years ago

Network policies are implemented in calico by adding rules to iptables in each node

If we would change to ipvs? Is it network policies still implemented with iptables then?

viktor-f commented 3 years ago

Network policies are implemented in calico by adding rules to iptables in each node

If we would change to ipvs? Is it network policies still implemented with iptables then?

Honestly not sure. Tried to look at the docs but I'm not seeing a clear answer to that. https://docs.projectcalico.org/networking/use-ipvs Regardless I don't think that calico exposes any other metrics when using IPVS. I don't know much about IPVS but after some searchin git looks like you could get some metrics from it. Though I'm not sure it would be metrics that help this issue.

Might look more into this tomorrow.

cristiklein commented 3 years ago

Thanks @viktor-f ! Really good information so far.

Could you check what container_network_transmit_packets_dropped_total [1] reports? Does that correlate in any way with NetworkPolicies?

[1] https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md

viktor-f commented 3 years ago

Thanks @viktor-f ! Really good information so far.

Could you check what container_network_transmit_packets_dropped_total [1] reports? Does that correlate in any way with NetworkPolicies?

[1] https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md

It looks like it just reports the amount of dropped packets for each pod that is sending traffic. But does not care where the traffic is going, so you can't really map it to network policies in any way.

cristiklein commented 3 years ago

Well, no, but does it indicate which Pod is sending traffic that it's not supposed to?

viktor-f commented 3 years ago

Well, no, but does it indicate which Pod is sending traffic that it's not supposed to?

It does indicate which pod is sending traffic that is dropped. So I guess we could say that if a pod is constantly/often sending packets that get dropped, then it is probably trying to reach something that it's not supposed to (e.g. because a network policy is blocking it).

cristiklein commented 3 years ago

@viktor-f More of this! :smile:

viktor-f commented 3 years ago

Did some testing with that metric. I setup some pods that tried to contact other pods and then added a networkpolicy that did not allow the traffic. Unfortunately the metric you mentioned still stayed at 0 dropped packets. Not sure why they didn't count

cristiklein commented 3 years ago

Hmm ... This source code suggests that we are looking at the network interface level of the container, which is below the firewall.

Can you also check container_network_transmit_packets_errors_total?

Finally, can you check if iptables have relevant comments. If yes, it might be possible to use this.

Otherwise, I'm tempted to move this to reevaluate.

viktor-f commented 3 years ago

Yes, checked that one and it also didn't count these packets.

I found https://github.com/box/kube-iptables-tailer that also look promising. But it seems to require that I configure iptables a bit. Trying to test that now. Can look at the tool you linked as well.

viktor-f commented 3 years ago

Finally I got this tool to work https://github.com/monzo/calico-accountant based on this blogpost https://monzo.com/blog/we-built-network-isolation-for-1-500-services In that blogpost they also wrote about using https://github.com/box/kube-iptables-tailer for more detailed metrics and logs. Could be worth looking into in the future. (possible helm chart we could use https://github.com/honestica/lifen-charts/tree/master/kube-iptables-tailer, not tested)

I never got time to test https://github.com/madron/iptables-exporter, so I don't know if that's a good alternative or not.

Follow up issue to actually add this to ck8s can be found here: https://github.com/elastisys/compliantkubernetes-apps/issues/234 It includes some manifests that I used and a prototype dashboard.