go-kratos / kratos

Your ultimate Go microservices framework for the cloud-native era.
https://go-kratos.dev
MIT License
23.26k stars 4k forks source link

Contrib/Consul has some issue when dealing with multi datacenter consul #2813

Closed jeffcaiz closed 11 months ago

jeffcaiz commented 1 year ago

What happened:

Use contrib discovery - consul, v2.6.1, consul configured with 3 datacenter, apply WithDatacenter(MultiDatacenter) When try to dial gRPC with discovery configured with above, it will report error: discovery create watcher overtime.

What you expected to happen:

No error.

How to reproduce it (as minimally and precisely as possible):

  1. Configure a local consul cluster with 3 IDC
  2. Initialize the consul registery
    r := consul.New(cc, consul.WithDatacenter(consul.MultiDatacenter)
  3. Invoke an grpc dial
    endpoint := fmt.Sprintf("discovery:///%v", target)
    finalOpts := []grpc.ClientOption{
    grpc.WithEndpoint(endpoint),
    grpc.WithDiscovery(r),
    }

Anything else we need to know?:

It seems that the consul component is not yet finished. There is something wrong I believe:

1. Wrong consul wait index

It seems originally it only support single datacenter, it use the same index for ALL datacenter, which is not right. For multiple datacenter, different node will have different index.

2. Timeout value seems not good

Consul API wait is hardcoded to 55 seconds, with a configureable timeout with 10 seconds as default value. The default setting will result in cancelled context before Consul API longpoll returns, which I think is not desirable. I think 60 seconds for default timeout will be fine as a final keeper.

3. Polling consul with blocking API one by one results in delayed service update

If we still use the current way to handle multiple datacenter service watch, which polls one by one, service update in DC-beta will be notified only after, say, finishing polls of DC-alpha, which is 10 seconds late now.

Environment:

jeffcaiz commented 1 year ago

I've finished a draft modification, I plan to issue a PR later

shenqidebaozi commented 1 year ago

@jeffcaiz is there any progress?

jeffcaiz commented 1 year ago

After my last post, I did more change to this module, then assigned to another project. will soon resume

dosubot[bot] commented 11 months ago

Hi, @jeffcaiz. I'm Dosu, and I'm helping the kratos team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue is related to the Contrib/Consul package when dealing with multi-datacenter Consul configurations. It seems that you have made progress on this issue by finishing a draft modification and planning to issue a PR later. Another user, @shenqidebaozi, has also shown interest in the progress. It's great to hear that you have made additional changes to the module and will soon resume work on it.

Before we mark this issue as stale, we wanted to check with you if it is still relevant to the latest version of the kratos repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution and we look forward to hearing from you soon.

Best regards, Dosu

jeffcaiz commented 11 months ago

Since there is already an approved change #3063 to support MDC, Just close this issue

dosubot[bot] commented 11 months ago

Thank you @jeffcaiz for closing the issue! We appreciate your contribution to the Kratos repository.

shenqidebaozi commented 11 months ago

@jeffcaiz Before officially merge, I hope you can also participate in the review and try it out. Can it meet your needs??

jeffcaiz commented 11 months ago

Okay, A quick check looks like the WaitIndex is not separated between different Consul Agent, which is not right. And GetService seems still query in serial way.

I double check the Index issue, which is most important and I am not working with Kratos/Consul for a while (Kratos/MicroService is too overwhelming for our team/biz, 😜)

包子 @.***> 于2023年11月8日周三 18:42写道:

@jeffcaiz https://github.com/jeffcaiz Before officially joining, I hope you can also participate in the review and try it out. Can it meet your needs?

— Reply to this email directly, view it on GitHub https://github.com/go-kratos/kratos/issues/2813#issuecomment-1801602057, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADKKSUCFELDEAD2SNXVWPHLYDNOZJAVCNFSM6AAAAAAXWVVELSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBRGYYDEMBVG4 . You are receiving this because you were mentioned.Message ID: @.***>

-- yours sincerely, Jeff

dosubot[bot] commented 11 months ago

Thank you @jeffcaiz for closing the issue! We appreciate your contribution to the Kratos repository.

shenqidebaozi commented 11 months ago

@jeffcaiz The Kratos resolver uses the Watch interface, which will obtain instances of multiple registries in parallel using different indices. As for GetService, it is intentionally designed to obtain instances of each registry serially every time it is called, without using the WaitIndex. The GetService method will not be used by Kratos