go-kratos / kratos

Your ultimate Go microservices framework for the cloud-native era.
https://go-kratos.dev
MIT License
23.31k stars 4.01k forks source link

[consul Registry]: When grpc enters idle mode, “last connection error” occurs on the client and cannot be recovered #3185

Closed duc-cnzj closed 7 months ago

duc-cnzj commented 9 months ago

What happened:

When grpc enters idle mode, A last connection error occurs on the client and cannot be recovered.

What you expected to happen:

needs to continue to watch the latest instance of the service

How to reproduce it (as minimally and precisely as possible):

Bff grpc client adds WithIdleTimeout option

image

after 5 seconds, Restart the server grpc service, pod ip changes,But the client is no longer updated

image-20240119215331303

Anything else we need to know?:

Before entering idle

image-20240119104636822

After entering idle

image-20240119131138921

When grpc enters idle mode, watcher.Stop() Called and <-ctx.Done()

image-20240119214610867

Environment:

PR

fix: 当grpc结束闲置模式的时候,需要继续去获取服务的最新实例 #3162

dosubot[bot] commented 9 months ago

🤖

The issue you're experiencing might be due to the gRPC connection going into the IDLE state and not being able to recover when the service instance changes. This could be a problem with the gRPC resolver not being able to update the connection when the service instance changes.

To resolve this issue, you can try to add a custom gRPC resolver that can handle the service instance changes more effectively. Here's an example of how you can do this:

package main

import (
    "github.com/go-kratos/kratos/v2/transport/grpc/resolver"
    "google.golang.org/grpc/resolver"
)

type customBuilder struct {
    discovery registry.Discovery
}

func (b *customBuilder) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
    r := &customResolver{
        target: target,
        cc:     cc,
        d:      b.discovery,
    }
    r.start()
    return r, nil
}

func (b *customBuilder) Scheme() string {
    return "custom"
}

type customResolver struct {
    target resolver.Target
    cc     resolver.ClientConn
    d      registry.Discovery
}

func (r *customResolver) start() {
    // Implement your logic to watch the service instance changes and update the gRPC connection accordingly.
}

func (r *customResolver) ResolveNow(resolver.ResolveNowOptions) {}

func (r *customResolver) Close() {}

func main() {
    // Register your custom resolver builder.
    resolver.Register(&customBuilder{discovery: yourDiscovery})

    // Use your custom resolver when dialing.
    conn, err := grpc.Dial("custom:///your-service-name", grpc.WithInsecure())
}

In the start method of the customResolver, you should implement your logic to watch the service instance changes from the registry.Discovery and call resolver.ClientConn.UpdateState to update the gRPC connection accordingly.

Please replace yourDiscovery and your-service-name with your actual discovery instance and service name.

This way, when the service instance changes, your custom resolver can update the gRPC connection to point to the latest service instance, which should resolve the issue you're experiencing.

You can refer to the transport/grpc/client.go for more details on how gRPC connections are handled in Kratos.

As for how Kratos handles gRPC connections entering idle mode, I wasn't able to find specific information within the repository. It's possible that this behavior is managed by the underlying gRPC library rather than Kratos itself. If you have more specific questions or need further clarification, please let me know!

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.