The issue is that the Connect(client) call returns Status::OK irrespective of whether the GCS Server has been initiated
It first reports after 5 seconds that it can't connect, then after a minute kills the session with an EXIT_FAILURE.
Again these are set by RayConfig params.
If the client does not exist then then the thread executing the server (I think) throws the error which only gets reported but not caught in the Julia REPL
Unfortunately the gcs_is_down_ field is private, however there is a way to check if the server is alive that uses a callback
However, I don't think it's worth directly implementing this. The timeout should take care of things it's just that the error won't be nicely caught/reported in Julia but we can add that as a follow up.
Follow up to comment thread:
The issue is that the
Connect(client)
call returnsStatus::OK
irrespective of whether the GCS Server has been initiatedIt first reports after 5 seconds that it can't connect, then after a minute kills the session with an
EXIT_FAILURE
. Again these are set byRayConfig
params.If the
client
does not exist then then the thread executing the server (I think) throws the error which only gets reported but not caught in the Julia REPLhttps://github.com/ray-project/ray/blob/cde6e887cbb21a9cae2632e3e4b883d913d38a05/src/ray/rpc/gcs_server/gcs_rpc_client.h#L212-L216
Unfortunately the
gcs_is_down_
field is private, however there is a way to check if the server is alive that uses a callbackHowever, I don't think it's worth directly implementing this. The timeout should take care of things it's just that the error won't be nicely caught/reported in Julia but we can add that as a follow up.
_Originally posted by @glennmoy in https://github.com/beacon-biosignals/Ray.jl/pull/211#discussion_r1367222050_