To simplify the check (config) reconciliation logic. The previous implementation used a map of check names to checks, which added unnecessary complexity to the registration, deletion, and updating of checks.
Closes #98
Changes
A new struct, ChecksController, has been introduced to handle the reconciliation logic of the Checks. This component can start, act, and shutdown autonomously, making the registration, update, and removal of checks simpler and more straightforward.
In addition, a runtime.Checks struct has been introduced to hold the checks in a slice and provide thread-safe methods to add, delete, and iterate over the checks. This struct complements the runtime.Config struct, which holds the configuration for the checks. The checks are now directly held in a dynamic slice, further simplifying the logic and making the code easier to understand and maintain.
The changes include:
Creation of the ChecksController and runtime.Checks structs.
Refactoring of the reconciliation logic to use the new ChecksController and runtime.Checks.
Updates to the functions and methods that previously interacted with the map of checks to now interact with the ChecksController and runtime.Checks.
For additional information look at the commits.
Tests done
I've provided several new tests.
[x] Unit tests succeeded.
[x] E2E tests succeeded.
Manual e2e tests
Logs:
$ go run main.go run --config .tmp/config/start-config.yaml
Using config file: .tmp/config/start-config.yaml
{"time":"2024-02-08T12:57:08.155606092+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/cmd.NewCmdRun.run.func1","file":"/home/installadm/dev/github/sparrow/cmd/run.go","line":81},"msg":"Running sparrow"}
{"time":"2024-02-08T12:57:08.15570043+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow/targets.(*gitlabTargetManager).Reconcile","file":"/home/installadm/dev/github/sparrow/pkg/sparrow/targets/gitlab.go","line":81},"msg":"Starting global gitlabTargetManager reconciler"}
{"time":"2024-02-08T12:57:08.15594968+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/api.(*api).Run.func1","file":"/home/installadm/dev/github/sparrow/pkg/api/api.go","line":76},"msg":"Serving Api","addr":":8080"}
{"time":"2024-02-08T12:57:38.179964595+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/config.(*FileLoader).Run","file":"/home/installadm/dev/github/sparrow/pkg/config/file.go","line":79},"msg":"Successfully got local runtime configuration"}
{"time":"2024-02-08T12:57:38.180136378+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/checks/dns.(*DNS).Run","file":"/home/installadm/dev/github/sparrow/pkg/checks/dns/dns.go","line":99},"msg":"Starting dns check","interval":"20s"}
{"time":"2024-02-08T12:57:38.180178287+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/checks/health.(*Health).Run","file":"/home/installadm/dev/github/sparrow/pkg/checks/health/health.go","line":91},"msg":"Starting healthcheck","interval":"10s"}
{"time":"2024-02-08T12:57:38.18027275+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/checks/latency.(*Latency).Run","file":"/home/installadm/dev/github/sparrow/pkg/checks/latency/latency.go","line":97},"msg":"Starting latency check","interval":"20s"}
{"time":"2024-02-08T12:58:08.181125735+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/config.(*FileLoader).Run","file":"/home/installadm/dev/github/sparrow/pkg/config/file.go","line":79},"msg":"Successfully got local runtime configuration"}
{"time":"2024-02-08T12:58:08.392343533+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow/gitlab.(*Client).FetchFiles","file":"/home/installadm/dev/github/sparrow/pkg/sparrow/gitlab/gitlab.go","line":149},"msg":"Successfully fetched all target files","files":2}
{"time":"2024-02-08T12:58:38.182087954+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/config.(*FileLoader).Run","file":"/home/installadm/dev/github/sparrow/pkg/config/file.go","line":79},"msg":"Successfully got local runtime configuration"}
{"time":"2024-02-08T12:59:08.182573733+01:00","level":"INFO","source":{"function":"github.com/caas-team/sparrow/pkg/config.(*FileLoader).Run","file":"/home/installadm/dev/github/sparrow/pkg/config/file.go","line":79},"msg":"Successfully got local runtime configuration"}
First reconcilation interval:
# HELP sparrow_dns_check_count Total number of DNS checks performed on the target and if they were successful.
# TYPE sparrow_dns_check_count counter
sparrow_dns_check_count{target="10.x.x.x"} 5
sparrow_dns_check_count{target="www.t-systems.com"} 5
sparrow_dns_check_count{target="www.telekom.de"} 5
# HELP sparrow_dns_duration Histogram of response times for DNS checks in seconds.
# TYPE sparrow_dns_duration histogram
sparrow_dns_duration_bucket{target="10.x.x.x",le="0.005"} 4
sparrow_dns_duration_bucket{target="10.x.x.x",le="0.01"} 4
sparrow_dns_duration_bucket{target="10.x.x.x",le="0.025"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="0.05"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="0.1"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="0.25"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="0.5"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="1"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="2.5"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="5"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="10"} 5
sparrow_dns_duration_bucket{target="10.x.x.x",le="+Inf"} 5
sparrow_dns_duration_sum{target="10.x.x.x"} 0.018343466000000003
sparrow_dns_duration_count{target="10.x.x.x"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="0.005"} 4
sparrow_dns_duration_bucket{target="www.t-systems.com",le="0.01"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="0.025"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="0.05"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="0.1"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="0.25"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="0.5"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="1"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="2.5"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="5"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="10"} 5
sparrow_dns_duration_bucket{target="www.t-systems.com",le="+Inf"} 5
sparrow_dns_duration_sum{target="www.t-systems.com"} 0.018483263
sparrow_dns_duration_count{target="www.t-systems.com"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="0.005"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="0.01"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="0.025"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="0.05"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="0.1"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="0.25"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="0.5"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="1"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="2.5"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="5"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="10"} 5
sparrow_dns_duration_bucket{target="www.telekom.de",le="+Inf"} 5
sparrow_dns_duration_sum{target="www.telekom.de"} 0.016302045
sparrow_dns_duration_count{target="www.telekom.de"} 5
# HELP sparrow_dns_duration_seconds Duration of DNS resolution attempts in seconds.
# TYPE sparrow_dns_duration_seconds gauge
sparrow_dns_duration_seconds{target="10.x.x.x"} 0.000708375
sparrow_dns_duration_seconds{target="www.t-systems.com"} 0.003439707
sparrow_dns_duration_seconds{target="www.telekom.de"} 0.003415548
# HELP sparrow_dns_status Specifies if the target can be resolved.
# TYPE sparrow_dns_status gauge
sparrow_dns_status{target="10.x.x.x"} 1
sparrow_dns_status{target="www.t-systems.com"} 1
sparrow_dns_status{target="www.telekom.de"} 1
# HELP sparrow_health_up Health of targets
# TYPE sparrow_health_up gauge
sparrow_health_up{target="https://gitlab.devops.telekom.de"} 1
sparrow_health_up{target="https://www.example.com"} 1
sparrow_health_up{target="https://www.google.com"} 1
sparrow_health_up{target="https://www.telekom.de"} 1
# HELP sparrow_latency_count Count of latency checks done
# TYPE sparrow_latency_count counter
sparrow_latency_count{target="https://example.com"} 5
sparrow_latency_count{target="https://google.com"} 5
# HELP sparrow_latency_duration Latency of targets in seconds
# TYPE sparrow_latency_duration histogram
sparrow_latency_duration_bucket{target="https://example.com",le="0.005"} 0
sparrow_latency_duration_bucket{target="https://example.com",le="0.01"} 0
sparrow_latency_duration_bucket{target="https://example.com",le="0.025"} 0
sparrow_latency_duration_bucket{target="https://example.com",le="0.05"} 0
sparrow_latency_duration_bucket{target="https://example.com",le="0.1"} 3
sparrow_latency_duration_bucket{target="https://example.com",le="0.25"} 4
sparrow_latency_duration_bucket{target="https://example.com",le="0.5"} 5
sparrow_latency_duration_bucket{target="https://example.com",le="1"} 5
sparrow_latency_duration_bucket{target="https://example.com",le="2.5"} 5
sparrow_latency_duration_bucket{target="https://example.com",le="5"} 5
sparrow_latency_duration_bucket{target="https://example.com",le="10"} 5
sparrow_latency_duration_bucket{target="https://example.com",le="+Inf"} 5
sparrow_latency_duration_sum{target="https://example.com"} 0.800343934
sparrow_latency_duration_count{target="https://example.com"} 5
sparrow_latency_duration_bucket{target="https://google.com",le="0.005"} 0
sparrow_latency_duration_bucket{target="https://google.com",le="0.01"} 0
sparrow_latency_duration_bucket{target="https://google.com",le="0.025"} 0
sparrow_latency_duration_bucket{target="https://google.com",le="0.05"} 0
sparrow_latency_duration_bucket{target="https://google.com",le="0.1"} 3
sparrow_latency_duration_bucket{target="https://google.com",le="0.25"} 5
sparrow_latency_duration_bucket{target="https://google.com",le="0.5"} 5
sparrow_latency_duration_bucket{target="https://google.com",le="1"} 5
sparrow_latency_duration_bucket{target="https://google.com",le="2.5"} 5
sparrow_latency_duration_bucket{target="https://google.com",le="5"} 5
sparrow_latency_duration_bucket{target="https://google.com",le="10"} 5
sparrow_latency_duration_bucket{target="https://google.com",le="+Inf"} 5
sparrow_latency_duration_sum{target="https://google.com"} 0.544606929
sparrow_latency_duration_count{target="https://google.com"} 5
# HELP sparrow_latency_duration_seconds Latency with status information of targets
# TYPE sparrow_latency_duration_seconds gauge
sparrow_latency_duration_seconds{status="200",target="https://example.com"} 0.100032163
sparrow_latency_duration_seconds{status="200",target="https://google.com"} 0.100295391
Second reconcilation interval:
# HELP sparrow_health_up Health of targets
# TYPE sparrow_health_up gauge
sparrow_health_up{target="https://gitlab.devops.telekom.de"} 1
sparrow_health_up{target="https://www.example.com"} 1
sparrow_health_up{target="https://www.google.com"} 1
sparrow_health_up{target="https://www.telekom.de"} 1
Motivation
To simplify the check (config) reconciliation logic. The previous implementation used a map of check names to checks, which added unnecessary complexity to the registration, deletion, and updating of checks.
Closes #98
Changes
A new struct,
ChecksController
, has been introduced to handle the reconciliation logic of the Checks. This component can start, act, and shutdown autonomously, making the registration, update, and removal of checks simpler and more straightforward.In addition, a
runtime.Checks
struct has been introduced to hold the checks in a slice and provide thread-safe methods to add, delete, and iterate over the checks. This struct complements theruntime.Config
struct, which holds the configuration for the checks. The checks are now directly held in a dynamic slice, further simplifying the logic and making the code easier to understand and maintain.The changes include:
ChecksController
andruntime.Checks
structs.ChecksController
andruntime.Checks
.ChecksController
andruntime.Checks
.For additional information look at the commits.
Tests done
I've provided several new tests.
Manual e2e tests
Logs:
First reconcilation interval:
Second reconcilation interval:
TODO