Open errordeveloper opened 3 years ago
What version of a hubble CLI are you using, and is this going through relay? It's possible that the CLI is not translating it into a correct filter.
There is a debug log for the GetFlows
on the server side (https://github.com/cilium/cilium/blob/70510d6ee22fda75cb623f900bd2445d4d11ba5a/pkg/hubble/observer/local_observer.go#L248) that prints the request, including all the filters. If your debug is enabled, could you fish out that line? Or if not see if you can enable the debug and post the line, please.
No need to check the server-side. A quick println
in the CLI itself (on master) generates the following GetFlowsRequest
for the following CLI invocation hubble observe --follow --namespace openshift-ingress --namespace openshift-console --namespace openshift-authentication --not --namespace openshift-monitoring --not --label reserved:remote-node --not --label reserved:host
:
{
"number": "20",
"follow": true,
"blacklist": [
{
"sourcePod": [
"openshift-monitoring/"
],
"sourceLabel": [
"reserved:remote-node",
"reserved:host"
]
},
{
"destinationPod": [
"openshift-monitoring/"
],
"destinationLabel": [
"reserved:remote-node",
"reserved:host"
]
}
],
"whitelist": [
{
"sourcePod": [
"openshift-ingress/",
"openshift-console/",
"openshift-authentication/"
]
},
{
"destinationPod": [
"openshift-ingress/",
"openshift-console/",
"openshift-authentication/"
]
}
]
}
Seems pretty obvious that this is a problem with the way build the --not
filter in the CLI.
The exclusion clause that we generate requires that both the pod name and the labels are set. It's clear that the above flow is include, while it has the label set, it's namespace is not openshift-monitoring
.
The correct blacklist
filter should look like:
"blacklist": [
{
"sourcePod": [
"openshift-monitoring/"
],
},{ // NOTE, new flowFilter here
"sourceLabel": [
"reserved:remote-node",
"reserved:host"
]
},
{
"destinationPod": [
"openshift-monitoring/"
],
},{ // NOTE, new filter here
"destinationLabel": [
"reserved:remote-node",
"reserved:host"
]
}
],
Seems pretty obvious that this is a problem with the way build the --not filter in the CLI.
I don't think it's obvious, and I also don't think it's a problem, it's working as intended (at least as implemented).
We deliberately only have a single filter for a CLI execution (the two filter approach is only applicable due to to/from being applied to left and right filter for things like --namespace
).
Your example shows four separate filters being constructed from that command, but that is just your interpretation of what --not foo --not bar
means. We treat it as !(foo & bar)
, but you suggest !(foo || bar)
.
If we change that behavior, other use cases would break. For example, with your suggested change --not --label reserved:remote-node --not namespace foo
would include all !namespace:foo
and !remote-node
flows.
I can't quite recall if this is written down somewhere in the history of us writing the CLI, but we very deliberately implemented only one filter per execution so we didn't have to resolve this operator precedence issue.
This is not easy to fix.
I went digging through git a bit, and it looks like the initial commit that implemented the --not
support (and a lot of other filtering) was squashed into initial commit
. Here is part of the original commit message that I wrote.
❯ ./hubble observe --help | grep "\-\-not"
--not filter[=true] Reverses the next filter to be blacklist i.e. --not --from-ip 2.2.2.2
This still only configured one filter per CLI call.
Supporting multiple filters from the command line is very
complicated. It basically means throwing everything away and
writing some sort of a mini-parser that can take ALL the
filters in string format and convert them into a collection
of tilers, i.e.
--filter from-ip:1.2.3.4 and (to-ip:2.2.2.2 or to-ip 3.3.3)
Maybe a problem for another time considering that one can
execute CLI multiple times and union the results if they
really need `OR` support.
I suppose another time
has perhaps come :)
I don't think it's obvious, and I also don't think it's a problem, it's working as intended (at least as implemented).
Apologies if I sounded dismissive, that was not my intent. Thanks for digging up the rationale.
This is not easy to fix.
I agree that fixing this is probably quite involved. What you dug up makes it more clear why it behaves the way it does.
I still wonder however if there is a way to fix this (maybe even by introducing a new mini-language), because I do believe the current implementation, while certainly internally consistent, is not super intuitive.
edit by @glibsm: Issue re-purposed for a more generic topic.
Original title:
I am using Cilium version 1.9.0 and running
hubble observe
in a Cilium pod on a node with IP address10.0.191.208
.When I run
hubble observe --follow --ip 10.0.191.208 --not --label reserved:remote-node --not --label reserved:host
I get no results.However, if I use a namespace filter, e.g.:
I get flows that have
reserver:host
identity, like this one: