dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

gRPC health check may say the server is unhealthy even if it's responding successfully to GetSystemInfo #254

Open dhiaayachi opened 1 month ago

dhiaayachi commented 1 month ago

For full context, see the discussion on this PR: https://github.com/temporalio/cli/pull/368#discussion_r1366898403

Expected Behavior

If GetSystemInfo returns successfully, the gRPC health check should also pass.

Actual Behavior

For a period of up to about 1 second after GetSystemInfo succeeds, the gRPC health check may fail (returning NOT_SERVING), falsely indicating that gRPC is down when it's not.

This was causing frequent intermittent failures (such as this one) in the CLI CI/CD pipeline until we worked around it in https://github.com/temporalio/cli/pull/368 .

Steps to Reproduce the Problem

  1. Launch the server
  2. Immediately try to connect to it using the Go SDK. (The Go SDK will wait for a successful GetSystemInfo response before returning a client object to the caller.)
  3. Once the Go SDK returns a client object, immediately use the client object to perform a health check of the server.
  4. Intermittently, the health check will fail.

Specifications

dhiaayachi commented 2 weeks ago

Thanks for reporting this issue.

This appears to be related to a known issue where the gRPC health check may fail for a brief period after GetSystemInfo succeeds. This issue has been resolved in Temporal 1.23.0.

You can find more information about this issue and how to upgrade to Temporal 1.23.0 in the Temporal Service Release Notes.