Pixiu Control Plane Support Service Metadata #484

Open AlbumenJ opened 2 years ago

AlbumenJ commented 2 years ago

AlbumenJ commented 2 years ago

需要定义一个 proto 用于上报应用级服务发现的元数据信息和接口的服务定义信息,前者是应用级服务发现的时候消费端通过注册中心拿到 revision 之后获取全量服务信息的接口,后者是客户端上报服务元数据的信息用于记录接口的各种参数,满足运维的需要(如服务测试等)技术上本质就是一个存储的功能,提供 set 和 get 的接口就可以了,目前阶段还不需要对数据进行加工。这个 proto 可以直接绑定到 xDS 的 grpc channel 上就行。

元数据的主要涉及的接口有 set(app string, revision string, metadataInfo string) get(app string, revision string)

服务定义的有 set(identifier string, definition string) get(identifier string)



MasterKenway commented 2 years ago

working on it

MasterKenway commented 2 years ago
  1. 针对服务上线对应元数据的删除问题较为复杂,在第一版中暂时不实现
  2. revision 区分于 etcd 的 MVCC,可以视作一个根据元数据内容计算出来的 MD5 的 key (哈希算法需要考虑碰撞的情况,考虑使用比较不容易碰撞的算法或设计一个检查机制)
  3. 注册中心使用的服务元数据”,用于服务发现的, “服务运维使用的服务元数据”,描述了接口出入参等,用于运维的。前者可以理解成是为了构建url,后者则是让运维的人知道现在发布的接口具体有哪些参数(参考 Java 版 Dubbo 的实现)
MasterKenway commented 2 years ago
  1. 元数据对于 Endpoint 维度来说是静态的,在服务启动之后不会进行改动,因此对于服务端来说可以只考虑上报的逻辑
  2. 原本消费端通过 Listener 监听 Provider 的元数据变动,现在则是将监听逻辑做到控制面中,通过 grpc 主动推送给消费端
chickenlj commented 2 years ago

这两类 Metadata,我觉得有必要区分开来定义,对于每一部分目前需要确认的主要内容如下:

  1. 描述资源的 CRD 定义,主要内容是 revision、metadata
  2. 用于通信的协议定义,基于 proto(service、message)
chickenlj commented 2 years ago

2. 可以视作一个根据元数据内容计算出来的 MD5 的 key (哈希算法需要考虑碰撞的情况,考虑使用比较不容易碰撞的算法或设计一个检查机制)

这个 md5 计算过程应该是由数据面完成的,控制面可以只做存储不用关心计算方式

chickenlj commented 2 years ago

5. 原本消费端通过 Listener 监听 Provider 的元数据变动,现在则是将监听逻辑做到控制面中,通过 grpc 主动推送给消费端


MasterKenway commented 2 years ago


MasterKenway commented 2 years ago

服务发现元数据 CRD 定义

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
  # name must match the spec fields below, and be in the form: <plural>.<group>
  name: metadata.networking.dubbo.io
  # group name to use for REST API: /apis/<group>/<version>
  group: metadata.networking.dubbo.io
  # list of versions supported by this CustomResourceDefinition
    - name: v1
      # Each version can be enabled/disabled by Served flag.
      served: true
      # One and only one version must be marked as the storage version.
      storage: true
          type: object
              type: object
                  type: string
                  type: string
                  type: string

  # either Namespaced or Cluster
  scope: Namespaced
    # plural name to be used in the URL: /apis/<group>/<version>/<plural>
    plural: metadata
    # singular name to be used as an alias on the CLI and for display
    singular: metadata
    # kind is normally the CamelCased singular type. Your resource manifests use this.
    kind: Metadata
    # shortNames allow shorter string to match your resource on the CLI
    - md


syntax = "proto3";

package pixiu.pkg.metadata;

option go_package = "pixiu/pkg/metadata/protos";

service MetadataService {
    rpc Publish(PublishMetadataRequest ) returns (PublishMetadataResponse );
    rpc Get(GetMetadataRequest) returns (GetMetadataResponse);

message PublishMetadataRequest {
    string application_name = 1;
    string revision = 2;
    string metadata_info = 3;

message PublishMetadataResponse {

message GetMetadataRequest {
    string application_name = 1;
    string revision = 2;

message GetMetadataResponse {
    string metadata_info = 1;


  1. 服务启动时上报元数据到控制面,通过控制面注册到 k8s CRD 中
  2. 监听服务下线,对于 application 维度下的所有服务均下线时,移除对应的元数据 ● ~在服务发现中增加监听器监听对应的上下线事件~ ● 为元数据设置一个时间戳,控制面以一天的频率去监听这部分数据的更新状态,客户端以更高的频率上报数据,若在一天之内的数据没有更新则视作服务下线,将对应的元数据删除。 存疑
  3. 由于控制面不关心 revision 参数的生成,需要考虑出现重复的情况
    ● ~在数据面更换其他的哈希算法~ ● 直接更新对应 revision 的数据
  4. 服务下线时的监听 服务定义 CRD

    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    # name must match the spec fields below, and be in the form: <plural>.<group>
    name: service-definition.networking.dubbo.io
    # group name to use for REST API: /apis/<group>/<version>
    group: service-definition.networking.dubbo.io
    # list of versions supported by this CustomResourceDefinition
    - name: v1
      # Each version can be enabled/disabled by Served flag.
      served: true
      # One and only one version must be marked as the storage version.
      storage: true
          type: object
              type: object
                  type: string
                  type: string
    # either Namespaced or Cluster
    scope: Namespaced
    # plural name to be used in the URL: /apis/<group>/<version>/<plural>
    plural: serviceDefinitions
    # singular name to be used as an alias on the CLI and for display
    singular: serviceDefinition
    # kind is normally the CamelCased singular type. Your resource manifests use this.
    kind: ServiceDefinition
    # shortNames allow shorter string to match your resource on the CLI
    - sd


    syntax = "proto3";

package pixiu.pkg.metadata;

option go_package = "pixiu/pkg/metadata/protos"

service ServiceDefinitionService { rpc Publish(PublishServiceDefinitionRequest c) returns (PublishServiceDefinitionResponse ); rpc Get(GetServiceDefinitionRequest) returns (GetServiceDefinitionResponse); }

message PublishServiceDefinitionRequest { string identitfier = 1; string serivce_definition = 2; }

message PublishServiceDefinitionResponse { }

message GetServiceDefinitionRequest { string identitfier = 1; }

message GetServiceDefinitionResponse { string serivce_definition = 1; }

1. proto 中的错误信息处理均通过 grpc error handling 进行处理
2. 元数据中心以及服务定义中的数据保存时只保存原始字符串
1. K8s 的核心是 API 而非容器:从理论到 CRD 实践(2022)  http://arthurchiao.art/blog/k8s-is-about-apis-zh/#33-api-%E6%98%AF-sql
MasterKenway commented 1 year ago

目前进度遇到使用 istio build-tools 生成 CRD 相关代码时遇到 import proto 文件查找失败的问题 repo: https://github.com/dubbo-go-pixiu/operator-api 日志如下:

kenway@DESKTOP-9LOOPCM:/mnt/d/Workspace/GolandProjects/api$ make gen
Syncing ./networking/v1beta1/destination_rule.proto from networking/v1alpha3/destination_rule.proto
Syncing ./networking/v1beta1/gateway.proto from networking/v1alpha3/gateway.proto
Syncing ./networking/v1beta1/service_entry.proto from networking/v1alpha3/service_entry.proto
Syncing ./networking/v1beta1/sidecar.proto from networking/v1alpha3/sidecar.proto
Syncing ./networking/v1beta1/virtual_service.proto from networking/v1alpha3/virtual_service.proto
Syncing ./networking/v1beta1/workload_entry.proto from networking/v1alpha3/workload_entry.proto
Syncing ./networking/v1beta1/workload_group.proto from networking/v1alpha3/workload_group.proto
authentication/v1alpha1/policy.proto:23:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/destination_rule.proto:16:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/envoy_filter.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/gateway.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/service_entry.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/sidecar.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/virtual_service.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/workload_entry.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1alpha3/workload_group.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1beta1/destination_rule.proto:16:8:google/api/field_behavior.proto: does not exist
networking/v1beta1/gateway.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1beta1/service_entry.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1beta1/sidecar.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1beta1/virtual_service.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1beta1/workload_entry.proto:17:8:google/api/field_behavior.proto: does not exist
networking/v1beta1/workload_group.proto:17:8:google/api/field_behavior.proto: does not exist
security/v1beta1/authorization_policy.proto:16:8:google/api/field_behavior.proto: does not exist
security/v1beta1/jwt.proto:16:8:google/api/field_behavior.proto: does not exist
type/v1beta1/selector.proto:16:8:google/api/field_behavior.proto: does not exist
make[1]: *** [Makefile.core.mk:36: gen-proto] Error 100
make: *** [Makefile:44: gen] Error 2
AlexStocks commented 1 year ago

0215日周会,上面问题都已经解决掉。蔡同学还在 review 中。