Describe the bug
Setting failure_trigger_sample_size as per the documentation, defines the number of failures to wait before triggering a HostDown event.
However, the HostDown event is not triggered even if the number of failures is exceeded.
Reproduction steps
Define an API that will return a failure, or times out
In the gateway logs that you should see messages of the form: [HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: <url>
Actual behavior
The HostDown event is never triggered.
Expected behavior
The HostDown event should be triggered when the number of failures set in failure_trigger_sample_size is exceeded.
Logs (debug mode or log file):
time="Nov 26 17:45:14" level=warning msg="[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: http://localhost:8181/status/400"
time="Nov 26 17:46:20" level=warning msg="[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: http://localhost:8181/status/400"
time="Nov 26 17:47:22" level=warning msg="[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: http://localhost:8181/status/400"
time="Nov 26 17:48:36" level=warning msg="[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: http://localhost:8181/status/400"
time="Nov 26 17:49:49" level=warning msg="[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: http://localhost:8181/status/400"
time="Nov 26 17:50:50" level=warning msg="[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: http://localhost:8181/status/400"
time="Nov 26 17:52:02" level=warning msg="[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: http://localhost:8181/status/400"
Additional context
I suspect that in if count, found := h.sampleCache.Get(failedHost.CheckURL); found { which is defined in HostReporterh.sampleCache is local to the goroutine. Thus, each time it is called you get the failure count from a different goroutine which leads to the behaviour seen.
Branch/Environment/Version Gateway :v 2.9.1 Dashboard :v1.9.1
Describe the bug Setting
failure_trigger_sample_size
as per the documentation, defines the number of failures to wait before triggering aHostDown
event.However, the HostDown event is not triggered even if the number of failures is exceeded.
Reproduction steps
[HOST CHECKER] [HOST DOWN BUT NOT REACHED LIMIT]: <url>
Actual behavior The
HostDown
event is never triggered.Expected behavior The
HostDown
event should be triggered when the number of failures set infailure_trigger_sample_size
is exceeded.Logs (debug mode or log file):
Additional context I suspect that in
if count, found := h.sampleCache.Get(failedHost.CheckURL); found {
which is defined in HostReporterh.sampleCache
is local to the goroutine. Thus, each time it is called you get the failure count from a different goroutine which leads to the behaviour seen.