Open RemiBou opened 14 hours ago
Related Issues
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
cc: @neild @tombergan
The gRPC algorithm is at https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md#server-enforcement
Two hours is certainly...a choice.
The default limit seems to be five minutes if there are outstanding requests, which is still an eternity. There doesn't seem to be any reasonable and safe way to health check an unresponsive Java (possibly any) gRPC connection. (Well, other than to send junk requests, such as OPTIONS *. So far as I can tell, those are fine, despite being more expensive than cheap PING frames.)
Not sure what the correct thing to do here is.
I used to run into similar issues with other reverse proxies as well. In the past I've disabled enforcement on the server side, in go that's https://pkg.go.dev/google.golang.org/grpc@v1.68.0/keepalive#EnforcementPolicy I think that may be a reasonable thing to do if grpc no longer has a direct mapping of the underlying streams?
Some options:
gRPC Java resets its "too many pings" counter every time it sends headers, data, or trailers. You can send as many pings as you want, as long as you wait for the server to send something after the ping response. Perhaps there's some way we can take advantage of this, but I haven't figured out how yet.
@neild from what I understood this change is trying to workaround issue with tcp when application isnt gracefully shutdown or there is a network issue and request are just timing out
Maybe it's not the role of the http2 client to overcome this ? grpc java closed similar problem https://github.com/grpc/grpc-java/issues/8380. To me the standard solution is to decrease the tcp retries on environment (Elastic Search suggest this).
On a side note, traefik didnt bump of x/net yet, istio didnt release the bump (but it is in master). Once they release it, this issue might become more important.
Go version
1.23.3
Output of
go env
in your module/workspace:What did you do?
We use traefik as a library for our service mesh.
What did you see happen?
Since upgrading x/net to 0.31.0 we have this error appearing on our instance that is deployed as a sidecar of a grpc java server
2024/11/26 10:32:41 httputil: ReverseProxy read error during body copy: http2: server sent GOAWAY and closed the connection; LastStreamID=561, ErrCode=ENHANCE_YOUR_CALM, debug="too_many_pings"
Downgrading net to 0.30 fixes the issues
I think it might be due to the new ping added here
https://github.com/golang/net/commit/f35fec92ec9213ee211cf45f451a5970386f7978#diff-e9bd9b4a514c2960ad85405e4a827d83c2decaaac70e9e0158a59db2d05807a7R3149 by @neild
You can find the framer logs here (http2debug=2) framerlog.log
This message is returned by grpc java here
https://github.com/grpc/grpc-java/blob/a79982c7fded85ac6f53896f045e959b6cbd7c43/netty/src/main/java/io/grpc/netty/NettyServerHandler.java#L983
Which runs this function https://github.com/grpc/grpc-java/blob/master/core/src/main/java/io/grpc/internal/KeepAliveEnforcer.java#L57
This basically refuses a ping if one was sent less than 2 hour before
What did you expect to see?
Stream to be kept and not reset