TykTechnologies / tyk

Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols
Other
9.65k stars 1.08k forks source link

[TT-1736] Global Host Checker Not Working #3367

Closed nerdydread closed 3 years ago

nerdydread commented 3 years ago

Branch/Environment/Version

Describe the bug A clear and concise description of what the bug is.

Reproduction steps Steps to reproduce the behavior:

  1. Import the attached API definition
  2. Configure the uptime_tests section of your tyk.conf to look like the following:
    "uptime_tests": {
     "disable": false,
     "config": {
       "failure_trigger_sample_size": 1,
       "time_wait": 10,
       "checker_pool_size": 50,
       "enable_uptime_analytics": true
     }
    }
  3. Restart your Gateway

Actual behavior The Host Checker is not able to run. Every time it attempts to run the following error is logged:

level=error msg="[HOST CHECKER] could not send work, error: the pool is not running"

Expected behavior The Host Checker would run the uptime tests as configured.

Logs (debug mode or log file): Attached.

Configuration (tyk config file): Attached. Archive.zip

candux commented 3 years ago

I had the same problem. Cause was, that the node always thought, there would be another master, even thought it was alone.

level=debug msg="Active Instance is: e4bd84fb-8da0-4d03-8c03-c2d4f64e8042" prefix=host-check-mgr
level=debug msg="--- I am: 721de54d-caf8-4734-bc0d-70c46ca696ce" prefix=host-check-mgr

I fixed this by expiring the redis key

172.31.37.145:6379> GET host-checker:PollerActiveInstanceID
"e4bd84fb-8da0-4d03-8c03-c2d4f64e8042"
172.31.37.145:6379> TTL  host-checker:PollerActiveInstanceID
(integer) -1
172.31.37.145:6379> expire  host-checker:PollerActiveInstanceID 1
(integer) 1
172.31.37.145:6379> GET host-checker:PollerActiveInstanceID
"fd8a4a41-2fbd-469b-acc6-b28e494c4db8"
172.31.37.145:6379> TTL  host-checker:PollerActiveInstanceID
(integer) 14
christtyk commented 3 years ago

Thanks for the report - I've added to our product backlog. @nerdydread did the workaround from Candux work for you?

christtyk commented 3 years ago

Closing as no reply