caas-team / sparrow

A monitoring tool to gather infrastructure network information
Apache License 2.0
6 stars 4 forks source link

feat: dns check #91

Closed lvlcn-t closed 8 months ago

lvlcn-t commented 8 months ago

Motivation

To check if an address/host can be looked up.

Closes #81

Changes

Pretty much copy paste but I've redistributed the metrics stuff to a dedicated file and needed to implement a Resolver interface to properly mock the lookup process.

For additional information look at the commits.

Tests done

I've provided many success and error unit tests.

Manual e2e test

Logs

$ go run main.go run --config .tmp/start-config.yaml 
Using config file: .tmp/start-config.yaml
{"time":"2024-01-25T16:07:57.730409738+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/cmd.NewCmdRun.run.func1","file":"/home/installadm/dev/github/sparrow/cmd/run.go","line":82},"msg":"Running sparrow"}
{"time":"2024-01-25T16:07:57.730592738+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow.(*Sparrow).api.func1","file":"/home/installadm/dev/github/sparrow/pkg/sparrow/api.go","line":81},"msg":"Serving Api","addr":":8080"}
{"time":"2024-01-25T16:07:57.730575228+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow/targets.(*gitlabTargetManager).Reconcile","file":"/home/installadm/dev/github/sparrow/pkg/sparrow/targets/gitlab.go","line":80},"msg":"Starting global gitlabTargetManager reconciler"}
{"time":"2024-01-25T16:07:57.730584526+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/config.(*FileLoader).Run","file":"/home/installadm/dev/github/sparrow/pkg/config/file.go","line":48},"msg":"Reading config from file","file":"./.tmp/run-config.yaml"}
{"time":"2024-01-25T16:07:57.730894566+01:00","level":"WARN","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow.(*Sparrow).registerCheck","file":"/home/installadm/dev/github/sparrow/pkg/sparrow/run.go","line":232},"msg":"Check is not registered","name":"health"}
{"time":"2024-01-25T16:07:57.730911446+01:00","level":"WARN","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow.(*Sparrow).registerCheck","file":"/home/installadm/dev/github/sparrow/pkg/sparrow/run.go","line":232},"msg":"Check is not registered","name":"latency"}
{"time":"2024-01-25T16:07:57.73101715+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.(*DNS).Run","file":"/home/installadm/dev/github/sparrow/pkg/checks/dns/dns.go","line":86},"msg":"Starting dns check","interval":"20s"}
{"time":"2024-01-25T16:08:17.81579971+01:00","level":"ERROR","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.getDNS","file":"/home/installadm/dev/github/sparrow/pkg/checks/dns/dns.go","line":242},"msg":"Error while looking up address","address":"www.google.com","error":"lookup www.google.com on 127.0.0.53:53: no such host"}
{"time":"2024-01-25T16:08:17.815863855+01:00","level":"WARN","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.(*DNS).check.Retry.func3","file":"/home/installadm/dev/github/sparrow/internal/helper/retry.go","line":49},"msg":"Effector call failed, retrying in 1s"}
{"time":"2024-01-25T16:08:18.884199559+01:00","level":"ERROR","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.getDNS","file":"/home/installadm/dev/github/sparrow/pkg/checks/dns/dns.go","line":242},"msg":"Error while looking up address","address":"www.google.com","error":"lookup www.google.com on 127.0.0.53:53: no such host"}
{"time":"2024-01-25T16:08:18.884392831+01:00","level":"WARN","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.(*DNS).check.Retry.func3","file":"/home/installadm/dev/github/sparrow/internal/helper/retry.go","line":49},"msg":"Effector call failed, retrying in 2s"}
{"time":"2024-01-25T16:08:20.954973345+01:00","level":"ERROR","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.getDNS","file":"/home/installadm/dev/github/sparrow/pkg/checks/dns/dns.go","line":242},"msg":"Error while looking up address","address":"www.google.com","error":"lookup www.google.com on 127.0.0.53:53: no such host"}
{"time":"2024-01-25T16:08:20.955029847+01:00","level":"WARN","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.(*DNS).check.Retry.func3","file":"/home/installadm/dev/github/sparrow/internal/helper/retry.go","line":49},"msg":"Effector call failed, retrying in 4s"}
{"time":"2024-01-25T16:08:25.026287794+01:00","level":"ERROR","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.getDNS","file":"/home/installadm/dev/github/sparrow/pkg/checks/dns/dns.go","line":242},"msg":"Error while looking up address","address":"www.google.com","error":"lookup www.google.com on 127.0.0.53:53: no such host"}
{"time":"2024-01-25T16:08:25.026327665+01:00","level":"WARN","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.(*DNS).check.func2","file":"/home/installadm/dev/github/sparrow/pkg/checks/dns/dns.go","line":208},"msg":"Error while looking up address","target":"www.google.com","error":"lookup www.google.com on 127.0.0.53:53: no such host"}

Exposed Metrics (redacted IPs)

# HELP sparrow_dns_duration_seconds Duration of DNS resolution attempts in seconds.
# TYPE sparrow_dns_duration_seconds gauge
sparrow_dns_duration_seconds{target="10.x.x.x"} 0.000644266
sparrow_dns_duration_seconds{target="www.google.com"} 0
sparrow_dns_duration_seconds{target="www.t-systems.com"} 0.018097008
sparrow_dns_duration_seconds{target="www.telekom.de"} 0.017367603
# HELP sparrow_dns_response_time_seconds Histogram of response times for DNS checks in seconds.
# TYPE sparrow_dns_response_time_seconds histogram
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="0.005"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="0.01"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="0.025"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="0.05"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="0.1"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="0.25"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="0.5"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="1"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="2.5"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="5"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="10"} 1
sparrow_dns_response_time_seconds_bucket{target="10.x.x.x",le="+Inf"} 1
sparrow_dns_response_time_seconds_sum{target="10.x.x.x"} 0.000644266
sparrow_dns_response_time_seconds_count{target="10.x.x.x"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="0.005"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="0.01"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="0.025"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="0.05"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="0.1"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="0.25"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="0.5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="1"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="2.5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="10"} 1
sparrow_dns_response_time_seconds_bucket{target="www.google.com",le="+Inf"} 1
sparrow_dns_response_time_seconds_sum{target="www.google.com"} 0
sparrow_dns_response_time_seconds_count{target="www.google.com"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="0.005"} 0
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="0.01"} 0
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="0.025"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="0.05"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="0.1"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="0.25"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="0.5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="1"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="2.5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="10"} 1
sparrow_dns_response_time_seconds_bucket{target="www.t-systems.com",le="+Inf"} 1
sparrow_dns_response_time_seconds_sum{target="www.t-systems.com"} 0.018097008
sparrow_dns_response_time_seconds_count{target="www.t-systems.com"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="0.005"} 0
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="0.01"} 0
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="0.025"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="0.05"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="0.1"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="0.25"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="0.5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="1"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="2.5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="5"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="10"} 1
sparrow_dns_response_time_seconds_bucket{target="www.telekom.de",le="+Inf"} 1
sparrow_dns_response_time_seconds_sum{target="www.telekom.de"} 0.017367603
sparrow_dns_response_time_seconds_count{target="www.telekom.de"} 1
# HELP sparrow_dns_status Specifies if the target can be resolved.
# TYPE sparrow_dns_status gauge
sparrow_dns_status{target="10.x.x.x"} 1
sparrow_dns_status{target="www.google.com"} 0
sparrow_dns_status{target="www.t-systems.com"} 1
sparrow_dns_status{target="www.telekom.de"} 1

API Schema

paths:
    /v1/metrics/dns:
        description: dns
        get:
            tags:
                - Metrics
                - dns
            description: Returns the performance data for check dns
            responses:
                "200":
                    description: Metrics for check dns
                    content:
                        application/json:
                            schema:
                                type: object
                                properties:
                                    data:
                                        type: object
                                        additionalProperties:
                                            has: null
                                            schema:
                                                type: object
                                                properties:
                                                    Error:
                                                        type: string
                                                    Resolved:
                                                        type: array
                                                        items:
                                                            type: string
                                                    Total:
                                                        type: number
                                                        format: double
                                    error:
                                        type: string
                                    timestamp:
                                        type: string
                                        format: date-time

Concerns

TODO