Open Nintorac opened 2 years ago
I have got this working using the /fedlearn.FederatedTraining/Heartbeat
path and accepting 0-99
response code as a success.
Is there a more specific number/range that indicates a healthy hearbeat or would it be better to implement a specific endpoint for health checks?
@yhwen, @nvidianz any comments on this one?
You can simply use TCP as the protocol and just check if the port is open. This works in all cases, even in TLS pass-thru mode. Heartbeat doesn't provide any more information on server's health.
We have plans to add real health check endpoint in the future releases.
Not sure how this will look like. Total gRPC noob, sorry.
I am trying to stand up a FL server in AWS ECS behind a load balancer, to do so requires the service has a health check and that the health check responds healthy to health probes.
Here is the CDK object I need to configure that defines how the health check is performed.
And here is gRPC docs on health checking.
Is there a) already a health check, if so how should I configure the cdk Health Check b) if not already existing is there some workaround that I can use in the meantime
Thanks!