Blockstream / greenlight

Build apps using self-custodial lightning nodes in the cloud
https://blockstream.github.io/greenlight/getting-started/
MIT License
113 stars 27 forks source link

Connection is getting dropped with `transport error` #428

Open cdecker opened 5 months ago

cdecker commented 5 months ago

This is a tracking issue for the transport error issue we are seeing since we activated the GL-LB for client -> node connections.

The root cause is a short loadbalancer keepalive timeout of approximately 30 seconds, in combination with calls that take longer than 30 seconds, such as some pay calls. The loadbalancer drops the connection after the connection has been idle, i.e., not transferring any data. Usually the client connection is configured to send keepalive messages (PING) when the connection is idle, while a call is pending (also referred to by the generic term stream, as everything in grpc is a stream). The configuration of the connection is here:

https://github.com/Blockstream/greenlight/blob/86c43646b0070b984f54bebfe245dd85c363cc44/libs/gl-client/src/node/mod.rs#L106-L112

And it configures keepalives to be very aggressive, and also enabled if there is no call pending. However, running with RUST_LOG=trace we can see that no pings are effectively being sent over the wire. This is strange as the signer is configured the same way, but that one is sending pings (it also has an active stream to stream_hsm_requests so that might be a difference).

The documentation on the semantics of keepalives is here: https://grpc.io/docs/guides/keepalive/