cilium / hubble

Hubble - Network, Service & Security Observability for Kubernetes using eBPF
Apache License 2.0
3.41k stars 246 forks source link

`hubble observe` currently limited to a single `AND` filter of all params #430

Open errordeveloper opened 3 years ago

errordeveloper commented 3 years ago

edit by @glibsm: Issue re-purposed for a more generic topic.

Original title:

negative `reserved:host` filter not working along with namespace filter

I am using Cilium version 1.9.0 and running hubble observe in a Cilium pod on a node with IP address 10.0.191.208.

When I run hubble observe --follow --ip 10.0.191.208 --not --label reserved:remote-node --not --label reserved:host I get no results.

However, if I use a namespace filter, e.g.:

hubble observe --follow --namespace  openshift-ingress  --namespace openshift-console  --namespace openshift-authentication --not --namespace openshift-monitoring --not --label reserved:remote-node --not --label reserved:host 

I get flows that have reserver:host identity, like this one:

{
  "time": "2020-11-18T11:51:25.477361804Z",
  "verdict": "FORWARDED",
  "ethernet": {
    "source": "36:b5:11:99:cc:64",
    "destination": "ba:06:7d:33:d3:56"
  },
  "IP": {
    "source": "10.0.191.208",
    "destination": "10.128.9.141",
    "ipVersion": "IPv4"
  },
  "l4": {
    "TCP": {
      "source_port": 41634,
      "destination_port": 1936,
      "flags": {
        "SYN": true
      }
    }
  },
  "source": {
    "identity": 1,
    "labels": [
      "reserved:host"
    ]
  },
  "destination": {
    "ID": 749,
    "identity": 19704,
    "namespace": "openshift-ingress",
    "labels": [
      "k8s:ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default",
      "k8s:ingresscontroller.operator.openshift.io/hash=6d48bfb46f",
      "k8s:io.cilium.k8s.namespace.labels.name=openshift-ingress",
      "k8s:io.cilium.k8s.namespace.labels.network.openshift.io/policy-group=ingress",
      "k8s:io.cilium.k8s.namespace.labels.openshift.io/cluster-monitoring=true",
      "k8s:io.cilium.k8s.policy.cluster=default",
      "k8s:io.cilium.k8s.policy.serviceaccount=router",
      "k8s:io.kubernetes.pod.namespace=openshift-ingress"
    ],
    "pod_name": "router-default-566b597bc9-bh6p9"
  },
  "Type": "L3_L4",
  "node_name": "ip-10-0-191-208.eu-west-1.compute.internal",
  "event_type": {
    "type": 4
  },
  "traffic_direction": "INGRESS",
  "trace_observation_point": "TO_ENDPOINT",
  "Summary": "TCP Flags: SYN"
}
{
  "time": "2020-11-18T11:51:25.477377198Z",
  "verdict": "FORWARDED",
  "ethernet": {
    "source": "ba:06:7d:33:d3:56",
    "destination": "36:b5:11:99:cc:64"
  },
  "IP": {
    "source": "10.128.9.141",
    "destination": "10.0.191.208",
    "ipVersion": "IPv4"
  },
  "l4": {
    "TCP": {
      "source_port": 1936,
      "destination_port": 41634,
      "flags": {
        "SYN": true,
        "ACK": true
      }
    }
  },
  "source": {
    "ID": 749,
    "identity": 19704,
    "namespace": "openshift-ingress",
    "labels": [
      "k8s:ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default",
      "k8s:ingresscontroller.operator.openshift.io/hash=6d48bfb46f",
      "k8s:io.cilium.k8s.namespace.labels.name=openshift-ingress",
      "k8s:io.cilium.k8s.namespace.labels.network.openshift.io/policy-group=ingress",
      "k8s:io.cilium.k8s.namespace.labels.openshift.io/cluster-monitoring=true",
      "k8s:io.cilium.k8s.policy.cluster=default",
      "k8s:io.cilium.k8s.policy.serviceaccount=router",
      "k8s:io.kubernetes.pod.namespace=openshift-ingress"
    ],
    "pod_name": "router-default-566b597bc9-bh6p9"
  },
  "destination": {
    "identity": 1,
    "labels": [
      "reserved:host"
    ]
  },
  "Type": "L3_L4",
  "node_name": "ip-10-0-191-208.eu-west-1.compute.internal",
  "reply": true,
  "event_type": {
    "type": 4,
    "sub_type": 3
  },
  "traffic_direction": "INGRESS",
  "trace_observation_point": "TO_STACK",
  "Summary": "TCP Flags: SYN, ACK"
}
glibsm commented 3 years ago

What version of a hubble CLI are you using, and is this going through relay? It's possible that the CLI is not translating it into a correct filter.

glibsm commented 3 years ago

There is a debug log for the GetFlows on the server side (https://github.com/cilium/cilium/blob/70510d6ee22fda75cb623f900bd2445d4d11ba5a/pkg/hubble/observer/local_observer.go#L248) that prints the request, including all the filters. If your debug is enabled, could you fish out that line? Or if not see if you can enable the debug and post the line, please.

gandro commented 3 years ago

No need to check the server-side. A quick println in the CLI itself (on master) generates the following GetFlowsRequest for the following CLI invocation hubble observe --follow --namespace openshift-ingress --namespace openshift-console --namespace openshift-authentication --not --namespace openshift-monitoring --not --label reserved:remote-node --not --label reserved:host:

{
  "number": "20",
  "follow": true,
  "blacklist": [
    {
      "sourcePod": [
        "openshift-monitoring/"
      ],
      "sourceLabel": [
        "reserved:remote-node",
        "reserved:host"
      ]
    },
    {
      "destinationPod": [
        "openshift-monitoring/"
      ],
      "destinationLabel": [
        "reserved:remote-node",
        "reserved:host"
      ]
    }
  ],
  "whitelist": [
    {
      "sourcePod": [
        "openshift-ingress/",
        "openshift-console/",
        "openshift-authentication/"
      ]
    },
    {
      "destinationPod": [
        "openshift-ingress/",
        "openshift-console/",
        "openshift-authentication/"
      ]
    }
  ]
}

Seems pretty obvious that this is a problem with the way build the --not filter in the CLI.

The exclusion clause that we generate requires that both the pod name and the labels are set. It's clear that the above flow is include, while it has the label set, it's namespace is not openshift-monitoring.

The correct blacklist filter should look like:

  "blacklist": [
    {
      "sourcePod": [
        "openshift-monitoring/"
      ],
    },{ // NOTE, new flowFilter here
      "sourceLabel": [
        "reserved:remote-node",
        "reserved:host"
      ]
    },
    {
      "destinationPod": [
        "openshift-monitoring/"
      ],
    },{ // NOTE, new filter here
      "destinationLabel": [
        "reserved:remote-node",
        "reserved:host"
      ]
    }
  ],
glibsm commented 3 years ago

Seems pretty obvious that this is a problem with the way build the --not filter in the CLI.

I don't think it's obvious, and I also don't think it's a problem, it's working as intended (at least as implemented).

We deliberately only have a single filter for a CLI execution (the two filter approach is only applicable due to to/from being applied to left and right filter for things like --namespace).

Your example shows four separate filters being constructed from that command, but that is just your interpretation of what --not foo --not bar means. We treat it as !(foo & bar), but you suggest !(foo || bar).

If we change that behavior, other use cases would break. For example, with your suggested change --not --label reserved:remote-node --not namespace foo would include all !namespace:foo and !remote-node flows.

I can't quite recall if this is written down somewhere in the history of us writing the CLI, but we very deliberately implemented only one filter per execution so we didn't have to resolve this operator precedence issue.

This is not easy to fix.

glibsm commented 3 years ago

I went digging through git a bit, and it looks like the initial commit that implemented the --not support (and a lot of other filtering) was squashed into initial commit. Here is part of the original commit message that I wrote.

❯ ./hubble observe --help | grep "\-\-not"
      --not filter[=true] Reverses the next filter to be blacklist i.e. --not --from-ip 2.2.2.2

This still only configured one filter per CLI call.
Supporting multiple filters from the command line is very
complicated. It basically means throwing everything away and
writing some sort of a mini-parser that can take ALL the
filters in string format and convert them into a collection
of tilers, i.e.
    --filter from-ip:1.2.3.4 and (to-ip:2.2.2.2 or to-ip 3.3.3)

Maybe a problem for another time considering that one can
execute CLI multiple times and union the results if they
really need `OR` support.

I suppose another time has perhaps come :)

gandro commented 3 years ago

I don't think it's obvious, and I also don't think it's a problem, it's working as intended (at least as implemented).

Apologies if I sounded dismissive, that was not my intent. Thanks for digging up the rationale.

This is not easy to fix.

I agree that fixing this is probably quite involved. What you dug up makes it more clear why it behaves the way it does.

I still wonder however if there is a way to fix this (maybe even by introducing a new mini-language), because I do believe the current implementation, while certainly internally consistent, is not super intuitive.