Open james-cobb opened 6 years ago
I think this may be caused if an exception is thrown in started() method. Then restarting() is called with an EndpointWriter that isn't correctly set up. I think the exception in started() could be caused by sending a message to a pid with empty address. I think this then could cause the infinite loop observed.
I have confirmed that an empty pid was causing this issue.
Question remains on the correct behaviour when a non existent remote pid is used when sending a message. An EndpointWriter will be created that currently throws an exception in Started, causing an infinite retry loop.
I think the answer might be a supervision strategy that only allows for a limited number of Restarts. Is that implemented anywhere else?
Simple demonstration of the retry loop:
Remote.start("localhost" , 1234)
send(PID.newBuilder().build(), "")
readLine()
In tests deliberately causing gRPC endpoints to fail, we found the EndpointWriter on the surviving node can get into an infinite loop.
The EndpointWriter is sent a Restarting message. In the restarting() method channel.shutdownNow() is called, but the lateinit channel is not yet initialized. This causes an exception in the restarting() method, which then causes the supervisor to force another restart, causing a loop using 100% CPU.
https://github.com/AsynkronIT/protoactor-kotlin/blob/2234df6fdc5cf4175624ec3f1632de72f718bcc0/proto-remote/src/main/kotlin/actor/proto/remote/EndpointWriter.kt#L67