cisco-open / synthetic-heart

Kubernetes synthetic testing and monitoring framework
Apache License 2.0
8 stars 6 forks source link

Synthetic Heart Controller periodically failing to schedule tests. #10

Closed kshave closed 4 months ago

kshave commented 5 months ago

Description

When adding new synthetic-tests, the controller periodically fails with the following error.

"error": "error selecting random agent, index out of range", "errorVerbose": "error selecting random agent, index out of range"
github.com/cisco-open/synthetic-heart/controller/internal/controller.SelectRandomAgent/workspace/controller/internal/controller/synthetictest_controller.go:291
github.com/cisco-open/synthetic-heart/controller/internal/controller.(*SyntheticTestReconciler).Reconcile
/workspace/controller/internal/controller/synthetictest_controller.go:170
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:227
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1695](http://ngithub.com/cisco-open/synthetic-heart/controller/internal/controller.SelectRandomAgent%5Cn%5Ct/workspace/controller/internal/controller/synthetictest_controller.go:291%5Cngithub.com/cisco-open/synthetic-heart/controller/internal/controller.(*SyntheticTestReconciler).Reconcile%5Cn%5Ct/workspace/controller/internal/controller/synthetictest_controller.go:170%5Cnsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile%5Cn%5Ct/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:119%5Cnsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler%5Cn%5Ct/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:316%5Cnsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem%5Cn%5Ct/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:266%5Cnsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2%5Cn%5Ct/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:227%5Cnruntime.goexit%5Cn%5Ct/usr/local/go/src/runtime/asm_amd64.s:1695)"

This causes multiple reconcile attempts to schedule a single synthetic-test to an agent, and eventually tests get scheduled.

Expected Behavior

When synthetic-tests are applied to a cluster, they are automatically scheduled to agents without controller errors.

Actual Behavior

The controller logs the above error and attempts multiple reconcile loops untill the error dissapears and a test then is scheduled.

Affected Version

v1.2.0-dev

Steps to Reproduce

  1. Deploy synthetic-heart agent, controller and restapi v1.2.0-dev to a cluster.
  2. Apply multiple syntheticTests at the same time.
  3. Observe controller logs.

Checklist

kshave commented 5 months ago

Seems as though the following function may be the culprit!

subbaksh commented 5 months ago

Thanks @kshave Feel free to debug and contribute if you're feeling up for it :) We can include it in the v1.2 release

subbaksh commented 4 months ago

Fixed and merged in v1.2.0