bloomberg / goldpinger

Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster.
Apache License 2.0
2.53k stars 180 forks source link

Seting up HTTP_TARGETS_TIMEOUT value make results unstable #131

Open tuxerrante opened 1 year ago

tuxerrante commented 1 year ago

Describe the bug After setting this value the target appears and disappears from the UI. I've tried also with and without double quotes, with and wihout 'ms'. Logs from the daemonset don't show much and we see no option to increase logging level through an ENV variable.

To Reproduce Steps to reproduce the behavior:

  1. Set the value in the values.yaml like
    extraEnv:
    - name: DISPLAY_NODENAME
    value: "true"
    - name: HTTP_TARGETS
    value: https://my.website/en
    - name: HTTP_TARGETS_TIMEOUT
    value: "1000ms"
  2. Rollout
  3. Wait a few minutes
  4. See error on the UI

Expected behavior The target website should appear green after increasing the timeout from the default "500ms" to "1000ms" https://github.com/bloomberg/goldpinger/blob/95363554e4078ce20e4fe746ce98b332e472e469/pkg/goldpinger/config.go#LL56C1-L56C1

Environment (please complete the following information):

Found 25 pods, using pod/pau-monitor-goldpinger-gcvpf
{"level":"info","ts":"2023-06-01T07:59:43.427Z","caller":"goldpinger/main.go:114","message":"Goldpinger","version":"v3.7.0","build":"Tue Oct 25 19:39:28 UTC 2022"}
{"level":"info","ts":"2023-06-01T07:59:43.427Z","caller":"goldpinger/main.go:125","message":"Kubeconfig not specified, trying to use in cluster config"}
{"level":"info","ts":"2023-06-01T07:59:43.428Z","caller":"goldpinger/main.go:147","message":"PodIP not set: pinging all pods"}
{"level":"info","ts":"2023-06-01T07:59:43.428Z","caller":"goldpinger/main.go:150","message":"--ping-number set to 0: pinging all pods"}
{"level":"info","ts":"2023-06-01T07:59:43.428Z","caller":"goldpinger/main.go:153","message":"IPVersions not set: settings to 4 (IPv4)"}
{"level":"info","ts":"2023-06-01T07:59:43.624Z","caller":"goldpinger/main.go:183","message":"All good, starting serving the API"}

"error": "Get \"http://10.0.3.145:80/check\": context deadline exceeded"

image

After removing the http timeout, the nodes come back green and the target website red image

sbingham-MET commented 7 months ago

Using goldpinger 3.9, I see the same behavior, we had some http targets that took longer than 500ms, and when trying to increase the timeout, the UI basically disappears. Just the presence of the http_targets_timeout seems to cause this issue. Any update on a possible solution or workaround?