alibaba / nacos

an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
https://nacos.io
Apache License 2.0
30.06k stars 12.81k forks source link

Naming local cache may be ignored in a rare scenario #12644

Open nkorange opened 1 week ago

nkorange commented 1 week ago

Describe the bug

Naming local cache may be ignored in a rare scenario. So even when local cache is not empty, user's invocation would still get exception.

Expected behavior

If local cache is not empty, Nacos client should never throw exception.

Actually behavior

In a rare case, when local cache is not empty, user's invocation would get exception.

How to Reproduce

It's hard to reproduce but it did happen in our production environment. Here is the related logic:

in the method at getServiceInfoBySubscribe:

  1. first it tried to get service info from local cache;
  2. then it will check if local cache is null or client is not subscribed: image
  3. if yes, it will try to subscribe from remote Nacos server.

Usually it will work, because whenever clientProxy.isSubscribed(...) returns false, it means the Nacos client has just reconnected to Nacos server and closed the old connection:

image

As new connection is ready, so the subscribe request would succeed.

But if the new connection is down immediately again so the subscribe request failed, then in the method getServiceInfoBySubscribe, an exception will be thrown.

Desktop (please complete the following information):

Additional context

I suggest to add a protection logic in the method getServiceInfoBySubscribe, so that whenever the local cache is not empty, the remote request error or any other exception will be ignored.

KomachiSion commented 3 days ago

When client call subscribe api to do sub, it should throw exception when connection not ready and subscribe failed, which notify users there is some exception for connection and should retry or do other operation.

If call getAllInstances with subscribe=true, It can be discuss whether it should throw exception when cache exist.

In my option, when subscribe=true, connection not ready should throw exception to notify, because users think getAllInstances is get instances from server, not only cache.

nkorange commented 3 days ago

In current implementation, subscribe=true and Nacos server disconnected would not throw exception. Only when subscribe=true and reconnect succeed and re-subscribe failed would the client throw exception.