alibaba / nacos

an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
https://nacos.io
Apache License 2.0
30.2k stars 12.83k forks source link

建议引入联邦级nacos及配置索引功能实现海量配置及服务发现注册下的信息统一管理界面 #2925

Closed zhangyixin1222 closed 3 years ago

zhangyixin1222 commented 4 years ago

将不同独立的nacos集群进行联邦性质的整合,每个nacos按一些分区原则(如proj名)维护一定任一nacos集群彼此间维护其他nacos集群的一些项目/配置名/服务名/接口名等各类索引范围信息,用于用户在使用时安装m个彼此独立的nacos集群,但通过同一平台进行界面管理以及服务名查询,通过分区机制及使用时的联邦性质整合实现nacos上配置信息及服务发现注册数的任意扩展,并通过client包中封装相关数据目标集群缓存及路由策略降低开发人员的维护成本

KomachiSion commented 4 years ago

It seems a quick big design. We need more discuss whether implement and haw to implement.

zhangyixin1222 commented 4 years ago

提供设计草图一张 未命名文件

KomachiSion commented 4 years ago

From a design perspective, it looks like a global application structure design rather than Nacos design.

  1. The schedule system over the nacos cluster is a outer system, like k8s, is a container of Nacos, we can consider about and plan to implement operator or sidecar in future.
  2. Now we use Raft to update and synchronize the metadata of Nacos.
  3. the routing table also an outer system, Nacos should not pay attention to.
  4. the client is disconnect from nacos cluster, it's hard for nacos to do beats.
zhangyixin1222 commented 4 years ago

From a design perspective, it looks like a global application structure design rather than Nacos design.

  1. The schedule system over the nacos cluster is a outer system, like k8s, is a container of Nacos, we can consider about and plan to implement operator or sidecar in future.
  2. Now we use Raft to update and synchronize the metadata of Nacos.
  3. the routing table also an outer system, Nacos should not pay attention to.
  4. the client is disconnect from nacos cluster, it's hard for nacos to do beats.

从宏观角度来说,这确实是分布式系统的使用到达一定程度时候大部分中间件都会出现的问题。 1、目前k8s无法较好的解决该调度问题,因为这里要解决的是组间leader之间的通信协调问题(也就是A组leader跟B组除leader之外的其他成员是不直接通信的(因为这样的话会引起网络拓扑以及网络连接数的膨胀)),k8s的协调是基于raft协议的etcd来做的,但可惜目前k8s版本没找到各组leader之间的单独通信问题的较好解决方案(现实问题还需考虑A组的leader会在A1,A2,A3之间变动的) 2、raft协议间通信保证了同一nacos集群内部各结点的的一致性,其实对于会话一致性还是强一致性具体采取的策略也不一样,zk保证了会话一致性所以各个成员都可以提供客户端连接,etcd选择了强一致性所以通过客户端只连接leader来得以保障。 3、路由表是为了在联邦模式下作为组间通信用的,类似发现命名空间不为本集群使用时候让调用端据此进行自动跳转。 4、nacos的使用者如果使用对实时性要求不高的话,可以加一层分发用的组件,通过削峰以避免使用时候的拥堵,比如解决惊群问题

KomachiSion commented 4 years ago

Yes, I agree with the most of you said. But I want to emphasize that almost in your design and need is more than Nacos' responsibility. It's APP structure design not nacos' design.

zhangyixin1222 commented 4 years ago

从单一职责原则来说最好的方案是类似于mycat的sql拦截转发功能一样单独开发一套app用于实现这些功能,实现前提是nacos等本身提供了相应的扩展对接,比如获取集群内谁是leader的查询接口leaderInfo=searchLeader(clusterAddrs),用于确认集群是否可用的心跳接口state=tick(tickTime,leaderAddr),以及其他用于高级玩家做自定义功能实现的扩展类接口,现版本很多中间件包括nacos等都提供了很多入门级的使用功能,但对于一些自定义功能类的扩展接口还不是那么完善,如果把实现类以外的一些抽象类也发布出来,使用者可以基于这些抽象类进行自定义实现与封装以在使用时候进行相应的实现类指定,应该是比较不错的发展方向

KomachiSion commented 4 years ago

Nacos' positioning is not a framework, but a registration center and configuration center.

It is the primary goal to ensure performance, accuracy, availability, and topology scalability.

While waiting for these capabilities to become more stable, Nacos will consider making pluggable and expandable plug-ins. At that time, a design like this will be needed. But for now, it is too early, too far beyond the scope of responsibility.

zhangyixin1222 commented 4 years ago

在用户有15台机器可用情况下,将15台机器单独布置一套nacos跟以5台机器为一组布置三套是不一样的,前者可以在会话一致性的要求上实现更高的读取性能但是因为每次写入副本的一致性要8台机器维护(当然如果引入了learner可以手工设置5台为particitant,其他10台为learner)及全局下的串行写入所以写入性能不如后者,后者略损失了读取性能但因为每次一致性维护只需要3台机器提升了写入性能及通过客户端命名空间选择实现集群间并行写入减少了冲突及可以更好的满足强一致性需求,在可用性上后者略低于前者但可以证明差异不是很明显。或者nacos可以借鉴kafka的设计思路,对外部来说是整体一套,但在内部根据命名空间对数据进行分区,即1-15台机器作为整体对外提供的使用下,将namespace1的数据放置在1-5机器上,将namespace2的数据放置在2-6机器上

KomachiSion commented 4 years ago

I understand you said and I agree what you said. But I mean that it's too early to do this enhancement.

stale[bot] commented 3 years ago

Thanks for your feedback and contribution. But the issue/pull request has not had recent activity more than 180 days. This issue/pull request will be closed if no further activity occurs 7 days later. We may solve this issue in new version. So can you upgrade to newest version and retry? If there are still issues or want to contribute again. Please create new issue or pull request again.