CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

UI: investigate frequency of state checks/ALB checks #2056

Open terrywbrady opened 1 month ago

ashleygould commented 3 weeks ago

For ALB:

agould@uc3-aws2023-ops:~> elb-tg-show uc3-mrtui-prd-tg
HealthCheckEnabled: true
HealthCheckIntervalSeconds: 30
HealthCheckPath: /
HealthCheckPort: traffic-port
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 5
   --health-check-interval-seconds (integer)
      The approximate amount of time, in seconds, between health checks of
      an individual target. The range is 5-300. If the target group proto-
      col is TCP, TLS, UDP, TCP_UDP, HTTP or HTTPS, the default is 30 sec-
      onds. If the target group protocol is GENEVE, the default is 10 sec-
      onds. If the target type is lambda , the default is 35 seconds.

   --health-check-timeout-seconds (integer)
      The amount of time, in seconds, during which no response from a tar-
      get means a failed health check. The range is 2120 seconds. For tar-
      get groups with a protocol of HTTP, the default is  6  seconds.  For
      target  groups  with a protocol of TCP, TLS or HTTPS, the default is
      10 seconds. For target groups with a protocol of GENEVE, the default
      is  5 seconds. If the target type is lambda , the default is 30 sec-
      onds.

   --healthy-threshold-count (integer)
      The number of consecutive health  check  successes  required  before
      considering a target healthy. The range is 2-10. If the target group
      protocol is TCP, TCP_UDP, UDP, TLS, HTTP or HTTPS, the default is 5.
      For  target  groups  with a protocol of GENEVE, the default is 5. If
      the target type is lambda , the default is 5.

   --unhealthy-threshold-count (integer)
      The number of consecutive health check failures required before con-
      sidering  a target unhealthy. The range is 2-10. If the target group
      protocol is TCP, TCP_UDP, UDP, TLS, HTTP or HTTPS, the default is 2.
      For  target  groups  with a protocol of GENEVE, the default is 2. If
      the target type is lambda , the default is 5.
ashleygould commented 3 weeks ago

For Nagios:

normal check interval: 10 min retry check interval: 1 min max check attempts: 3

terrywbrady commented 17 hours ago

@ashleygould , here is some data from a quick performance test.

Image

Using /state.json rather than / takes about half the time and it returns less data. Would this be a better choice for health check?

Or, are the differences not compelling enough to justify a change?

In the past week, I think my execution of these tests has triggered an unhealthy host alert in the ALB for both stage and prod.