Closed avlazarov closed 4 years ago
Hey @avlazarov! Have you tried to play with grpc_read_timeout
configuration? What happens when the default timeout, 60s, passes?
I assume that AnycableGo is not aware of you RPC servers went done because nginx still keeps the connection because of the grpc_read_timeout
and grpc_send_timeout
directives.
I assume that AnycableGo is not aware of you RPC servers went done
As I understood, other clients (new connections) work fine, i.e., gRPC connectivity is restored.
The problem is that the first one, the one that "caught" the broken connection, is getting stuck:
Bottom line is that performing actions on a subscription after getting error 502 blocks all new actions from being performed by the anycable-go server for a particular client/subscription.
@avlazarov Right?
And that's strange: if other clients could successfully perform an action, the first one should do this as well on the next attempt, since they uses the same grpc pool.
@palkan Yes, the odd part is that even when the next client makes a series of successful actions, the first one remains stuck. If instead of error 502
I totally shutdown nginx (causing refused connection), anycable-go
will perform the operations, print errors but once nginx is back again, the gRPC servers will correctly receive the actions and the client will no longer be stuck.
I'll try to reproduce it locally and come back when I find something.
I've tried to reproduce it at that simple chat application, but unfortunately (or fortunately) couldn't experience the problem.
@avlazarov Please, take a look the @bibendi 's PR above. We couldn't reproduce the problem. Are we missing something?
@palkan Sorry, I can't reproduce it after upgrading from Ubuntu 16.04 to 18.04. It might have been something related to that specific version of Nginx for Ubuntu, or I might have misconfigured something else in Nginx that I have not noticed.
AnyCable-Go version: 0.6.3 AnyCable gem version: 0.6.3 (same
anycable-rails
version) gRPC gem version: 1.20.0 nginx version: 1.17.3What did you do?
bundle exec anycable --rpc_host 0.0.0.0:50052
andbundle exec anycable --rpc_host 0.0.0.0:50051
anycable-go
viaanycable-go --headers=origin,cookie --debug=true --rpc_host=localhost:50050
ActionCable.subscribe
.subscription.perform 'do_stuff
'.do_stuff
action, theanycable-go
server receives error502
from nginx since both gRPC servers are gone.What did you expect to happen?
The
anycable-go
server to raise an error similar to when no connection to the gRPC is available (Perform error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure,
) and retry communicating with the gRPC server on the nextdo_stuff
action.What actually happened?
After 7., no attempt to send requests to the gRPC server are made (nothing is logged in the
anycable-go
server and nothing is available in the nginx access log), even if the gRPC servers are started up again. Meanwhile, the client get successful ping messages and can receive broadcasts and through the WS.If another client gets to subscribe to the same channel they'll either 1) get an error forcing them to reconnect when the gRPC servers are all down or 2) successfully subscribe and perform actions when the gRPC servers are up. The first client will still remain "stuck" however.
Bottom line is that performing actions on a subscription after getting error
502
blocks all new actions from being performed by theanycable-go
server for a particular client/subscription.Could you please give some directions on how to deal with this scenario? One possibility is to 'ack' for actions on the client side and reconnect altogether, but it adds some complexity.