envoyproxy / ratelimit

Go/gRPC service designed to enable generic rate limit scenarios from different types of applications.
Apache License 2.0
2.21k stars 428 forks source link

ShouldRateLimit error is never logged #597

Closed jwillker closed 3 weeks ago

jwillker commented 1 month ago

I'm facing an error that is hard to debug because the error is never logged.

I can see in the metric ratelimit_service_should_rate_limit_error grow, but I cannot understand the reason of the errors.

Changing the log level to debug is possible only to see this log: https://github.com/envoyproxy/ratelimit/blob/ca55e1b31d023d8b8aedb8a21ca3b769fb632e95/src/service/ratelimit.go#L288

But the err itself is not logged; it is returned by the function and can only be seen by the Envoy that makes the call. Because at the end of the day, this is a gRPC handler that returns this error to the envoy https://github.com/envoyproxy/go-control-plane/blob/v0.12.0/envoy/service/ratelimit/v3/rls.pb.go#L996

But in my case, I'm using the Ratelimit service for thousands of pods with Envoy as a sidecar(Istio), and it can be hard to find the pod that receives the error, enable the debug log, and see the error.

My purpose is to log the error in the Ratelimit service so that I can look up the error message.

I can open a PR to add the error log something like this:


func (this *service) ShouldRateLimit(
    ctx context.Context,
    request *pb.RateLimitRequest) (finalResponse *pb.RateLimitResponse, finalError error) {
    .....
    defer func() {
        err := recover()
        if err == nil {
            return
        }

        logger.Debugf("caught error during call")
        logger.Error(err) // <--- New log entry
        ....
    return response, nil
}