beacon-biosignals / Ray.jl

Julia API for Ray
Other
11 stars 1 forks source link

Connecting to GCSClient without a local raylet hangs #219

Open glennmoy opened 1 year ago

glennmoy commented 1 year ago

Follow up to comment thread:

The issue is that the Connect(client) call returns Status::OK irrespective of whether the GCS Server has been initiated

It first reports after 5 seconds that it can't connect, then after a minute kills the session with an EXIT_FAILURE. Again these are set by RayConfig params.

If the client does not exist then then the thread executing the server (I think) throws the error which only gets reported but not caught in the Julia REPL

https://github.com/ray-project/ray/blob/cde6e887cbb21a9cae2632e3e4b883d913d38a05/src/ray/rpc/gcs_server/gcs_rpc_client.h#L212-L216

Unfortunately the gcs_is_down_ field is private, however there is a way to check if the server is alive that uses a callback

However, I don't think it's worth directly implementing this. The timeout should take care of things it's just that the error won't be nicely caught/reported in Julia but we can add that as a follow up.

_Originally posted by @glennmoy in https://github.com/beacon-biosignals/Ray.jl/pull/211#discussion_r1367222050_