VictoriaMetrics / operator

Kubernetes operator for Victoria Metrics
Apache License 2.0
410 stars 141 forks source link

VMAgent Stops Scraping Metrics in WatchNamespace Mode #886

Closed knechtionscoding closed 5 months ago

knechtionscoding commented 5 months ago

I currently have the VM Operator installed into the victoria-metrics-operator namespace. I have a VMCluster installed into the victoria-metrics namespace. In the victoria-metrics namespace I have a number of VMServiceScrapes, VMPodScrapes, etc.

Here is an example of the VMPodScrape:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMPodScrape
metadata:
  name: kubernetes-pods
spec:
  podMetricsEndpoints:
    - scheme: http
      scrape_interval: 1m
      scrapeTimeout: 10s
      relabelConfigs:
...
  selector: {}
  namespaceSelector:
    any: true

When the VM Operator is running in Cluster mode metrics are properly scraped from all pods in all namespaces.

However, when I swap the VM Operator to only watch the victoria-metrics namespace:

watchNamespace: victoria-metrics

Then metrics are no longer scraped by VMAgent from pods in non victoria-metrics namespaces.

I've granted the correct Cluster role and Role to VMAgent with a specific CustomServiceAccount:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vmagent
rules:
- apiGroups:
  - ""
  - networking.k8s.io
  - extensions
  - discovery.k8s.io
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - endpointslices
  - pods
  - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  - configmaps
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  - /metrics/resources
  verbs:
  - get
- apiGroups:
  - route.openshift.io
  - image.openshift.io
  resources:
  - routers/metrics
  - registry/metrics
  verbs:
  - get

I've also confirmed that the VM Operator has rbac into the victoria-metrics namespace per this

I can also see the VMPodScrape referenced as found by the VM Operator in NamespacedMode:

victoria-metrics-operator-8494975597-7qs59 victoria-metrics-operator {"level":"info","ts":1709297411.4290285,"logger":"factory","msg":"selected PodScrapes","podscrapes":"victoria-metrics/all-scrape,victoria-metrics/kubernetes-pods,victoria-metrics/kubernetes-pods-secondary,victoria-metrics/kubernetes-pods-slow","namespace":"victoria-metrics","vmagent":"..."}

I'm very confused why this is occuring/what I've missed about watchNamespace mode.

My end goal is to limit the permissions of the victoria-metrics operator while still having the VM Cluster/VM Agent get metrics from an entire cluster.

f41gh7 commented 5 months ago

Hello, thanks for reporting an issue.

Currently, it's expected that operator cannot properly handle objects from other namespaces if it's configured to watch a single namespace.

Proper solution for that - add support for multiple namespaces watch.

It looks a bit tricky to me from configuration prospective.

Operator must have cluster-wide list and watch permissions for objects. Since API requests will be cluster scoped and operator can apply filtering to the objects by defined namespace predicates at configuration.

I think we can implement that in the one of future releases. Maybe next release, but I'm not sure about timing.

knechtionscoding commented 5 months ago

@f41gh7 does that mean the operator and the VM cluster have to be in the same namespace?

Because I'm only trying to watch a single name space: victoria-metrics.

I've got everything the operator needs to watch in the same namespace (victoria-metrics) except the operator itself.

knechtionscoding commented 5 months ago

Again, I can see that the podscrape is listed in the config the operator picks up, both in cluster and namespace mode. But in namespace VMagent isn't scraping the metrics.

f41gh7 commented 5 months ago

does that mean the operator and the VM cluster have to be in the same namespace?

Nope, it could be at the different namespaces.

Again, I can see that the podscrape is listed in the config the operator picks up, both in cluster and namespace mode. But in namespace VMagent isn't scraping the metrics.

Get it, sorry.

Well it should be possible in this case. I'd suggest to check VMAgent logs, maybe it doesn't have some permissions or makes incorrect calls.

knechtionscoding commented 5 months ago

@f41gh7 I'm not seeing any error logs in VMAgent.

I've compared the logs between clustered mode and namespaced mode. I'm not seeing any error logs in the VM Operator either.

However, in namespaced mode, I'm not seeing any reference to the kubernetes-pods podscrape job in the VMAgent mode despite seeing the reference in the vm operator. Is there somewhere I can check to see where/why the VM Agent isn't receiving the config?

f41gh7 commented 5 months ago

However, in namespaced mode, I'm not seeing any reference to the kubernetes-pods podscrape job in the VMAgent mode despite seeing the reference in the vm operator.

Could you check generate config at webui of vmagent? It's exposed at vmagent_addr:8429/service-discovery page. You may need to update vmagent to the recent version, it was added not so far ago.

There is no special logic behind config generation after scrapes were selected.

But, there is a special case, namespaceSelector.any could be ignored, if the defined serviceAccount name of vmagent is equal to the account name generated by operator.

In this case operator produces the following log records: Setting discovery for the single namespace only, since operator launched with set WATCH_NAMESPACE param

knechtionscoding commented 5 months ago

It turns out that was the problem! The service Account name was the same. I was just assigning permissions to the "default" SA. Assigning a new SA solved the problem