VictoriaMetrics / VictoriaMetrics

VictoriaMetrics: fast, cost-effective monitoring solution and time series database
https://victoriametrics.com/
Apache License 2.0
12.34k stars 1.22k forks source link

Promscrape of docker-swarm services spams errors about invalid CIDR address encountering a host-networked service #1028

Closed tomalaci closed 3 years ago

tomalaci commented 3 years ago

Describe the bug When you use host networking for a service and then setup prom-scrape config to scrape for tasks inside docker swarm, it will attempt to call http://services API to list docker swarm state and then spam a lot of errors about invalid CIDR conversion ("invalid CIDR address").

Those errors happen due to code defined in https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/lib/promscrape/discovery/dockerswarm/services.go#L104

That code will trigger the aforementioned error if it encounters a service with endpoint info such as this ("Addr" field is missing):

{
    ...
    "Endpoint": {
        "Spec": {
            "Mode": "vip"
        },
        "VirtualIPs": [
            {
                "NetworkID": "tecmd0cifajje0icv3knqdih1"
            }
        ]
    },
    ...
}

To Reproduce VictoriaMetrics has to scrape docker-swarm services/tasks. Example prom scrape config I am using (I don't think relabel_configs are needed for a quick repro):

scrape_configs:
  - job_name: 'dockerswarm'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock
        role: tasks
    relabel_configs:
      - source_labels: [__meta_dockerswarm_task_desired_state]
        regex: running
        action: keep
      - source_labels: [__meta_dockerswarm_service_label_prometheus_job]
        regex: .+
        action: keep
      - source_labels: [__meta_dockerswarm_network_name]
        regex: instat_monitoring
        action: keep
      - regex: __meta_dockerswarm_service_label_prometheus_(.+)
        action: labelmap
        replacement: $1
      - source_labels: [__meta_dockerswarm_task_slot]
        target_label: task_slot

Afterwards, create a docker swarm service which uses host networking mode. It should create a service definition which has Endpoint structured like this:

docker service inspect <service id>
{
        "ID": "no74e4ldvn5dtmw10nwkgfrl7",
        ...
        "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "tecmd0cifajje0icv3knqdih1"
                }
            ]
        }
}

Expected behavior It shouldn't spam that error if Virtual IP Addr field is empty since it should be expected that there could be services defined to be working in host networking mode. In my specific case I have few services doing heavy UDP-based SNMP polling, using swarm overlay network kills my packet throughput.

Might be useful: as per prometheus configuration (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config): "The __meta_dockerswarmnetwork meta labels are not populated for ports which are published with mode=host."

Screenshots If applicable, add screenshots to help explain your problem.

Version victoriametrics/victoria-metrics:v1.52.0 docker image

Docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:46:54 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:45:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Used command-line flags

-httpListenAddr ":8428"
-retentionPeriod 24
-loggerLevel INFO
-promscrape.config /prometheus.yml

Additional context Sample error logs being spammed:

2021-01-21T19:24:06.262Z    error   VictoriaMetrics/lib/promscrape/discovery/dockerswarm/services.go:104    cannot parse: "" as cidr for service label add, err: invalid CIDR address: ,
2021-01-21T19:24:06.262Z    error   VictoriaMetrics/lib/promscrape/discovery/dockerswarm/services.go:104    cannot parse: "" as cidr for service label add, err: invalid CIDR address: ,
2021-01-21T19:24:36.256Z    error   VictoriaMetrics/lib/promscrape/discovery/dockerswarm/services.go:104    cannot parse: "" as cidr for service label add, err: invalid CIDR address: ,
2021-01-21T19:24:36.257Z    error   VictoriaMetrics/lib/promscrape/discovery/dockerswarm/services.go:104    cannot parse: "" as cidr for service label add, err: invalid CIDR address: ,
2021-01-21T19:24:36.257Z    error   VictoriaMetrics/lib/promscrape/discovery/dockerswarm/services.go:104    cannot parse: "" as cidr for service label add, err: invalid CIDR address: ,
2021-01-21T19:24:36.257Z    error   VictoriaMetrics/lib/promscrape/discovery/dockerswarm/services.go:104    cannot parse: "" as cidr for service label add, err: invalid CIDR address: ,
2021-01-21T19:24:36.257Z    error   VictoriaMetrics/lib/promscrape/discovery/dockerswarm/services.go:104    cannot parse: "" as cidr for service label add, err: invalid CIDR address: ,
valyala commented 3 years ago

@f41gh7 , are there any news on this issue?

f41gh7 commented 3 years ago

must be fixed at related PR.

valyala commented 3 years ago

The issue must be fixed in the commit 48c8c5093b0eefbc563135d7d041aa049ea82d35 . @azeroc , could you build vmagent from this commit and verify whether the issue is fixed in your setup? See build instructions for vmagent.

tomalaci commented 3 years ago

The issue must be fixed in the commit 48c8c50 . @azeroc , could you build vmagent from this commit and verify whether the issue is fixed in your setup? See build instructions for vmagent.

Tested vmagent from v1.52.0 branch which expectedly spammed described errors. Afterwards I built vmagent from 48c8c50 commit and the CIDR error is gone, so I think this is fixed. Thanks!

Which release will this commit be included in?

valyala commented 3 years ago

Which release will this commit be included in?

The commit will be included in v1.54.0 . Let's re-open the issue until the release with bugfix is out.

valyala commented 3 years ago

The bugfix has been included in v1.54.0. Closing the issue as fixed.