apache / eventmesh

EventMesh is a new generation serverless event middleware for building distributed event-driven applications.
https://eventmesh.apache.org/
Apache License 2.0
1.61k stars 638 forks source link

[Feature] Integrate With OpenSchema #339

Open qqeasonchen opened 3 years ago

qqeasonchen commented 3 years ago

option 1. reference openmessage https://github.com/openmessaging/openschema/issues/1 https://github.com/openmessaging/openschema/blob/master/spec.md

yzhao244 commented 3 years ago

I am a user of eventmesh and I'm extremely interested in contributing to this project. I will go through the project related to EventMesh Schema Registry implementation which integrates with OpenSchema. I will follow them and get back to the community for further discussions.

jinrongluo commented 3 years ago

OpenSchema spec preview is released here:

https://github.com/openmessaging/openschema/blob/master/spec.md

qqeasonchen commented 3 years ago

I am a user of eventmesh and I'm extremely interested in contributing to this project. I will go through the project related to EventMesh Schema Registry implementation which integrates with OpenSchema. I will follow them and get back to the community for further discussions.

welcome and looking forward to your contributions.

yzhao244 commented 3 years ago

What is the question:

I am attempting a design which integrates with OpenSchema and is also easy to extend.

What would you like to be added: I suggest to add two more modules in overall eventmesh projects

  1. eventmesh-store-api This is a interface module which contains schemas registry persistency APIs such as the followings.

    public interface EventSchemaService extends SchemaRegistry  {
    void createSchema(SchemaRequest schemaRequest);
    
    List<SchemaResponse> readAllSchemas();
    
    void updateSchema(SchemaRequest schemaRequest, String schemaId);   
    
    void deleteSchema(SchemaRequest schemaRequest, String schemaId);
    }
  2. eventmesh-store-h2 This module contains the actual implementation of EventSchemaService which integrates with OpenSchema. I proposed to leverage using h2 database for persisting schema registry in eventmesh. However, this is also a pluggable module. Therefore, vendors can implement persistency using other techniques such as file system or any other data stores at their own will.

Why is this needed:

  1. It ensures extendibility of Schema Registry in eventmesh since vendors may have requirements of using different techniques such as in-memory db, mysql db or any other data store for persisting data.
  2. Furthermore, store layer can be extended with other management infomation for persistency such as subscriptions, topics. It is just this time we do for schema registry. :)
qqeasonchen commented 3 years ago

@yzhao244 This store here better differ from event-store in connector, what do you think?

qqeasonchen commented 3 years ago

@yzhao244 Would you like to share some new designs for us? thanks.

yzhao244 commented 3 years ago

@yzhao244 This store here better differ from event-store in connector, what do you think?

Yes, you are right. :) ... maybe name it something like "eventmesh-database-api".. The purpose of this interface module is for introducing an abstraction layer of registry APIs

yzhao244 commented 3 years ago

@yzhao244 Would you like to share some new designs for us? thanks.

What is the purpose of the design:

The purpose of the design is for introducing Schema Registry as part of the EventMesh. The Schema Registry is a central repository with RESTful interfaces for developers to define and register standard schemas. Addresses the problem of different data(event) format of producer and consumer.

What are the features to provide from Schema Registry:

  1. Persist and share version history of all schemas(schemas lifecycle management) and verify schema compatibility.
  2. Supports Avro, JSON, and Protobuf formats serialization/deserialization

What are the high level design to achieve the features

  1. Defines schema registry data models(subject, schema, version, compatibility) and schema REST API standards based on the open-source OpenSchema specifications.
  2. The eventmesh-database-api abstract module abstracts the CRUD capability of the schema registry into this module.
  3. Eventmesh-database-h2 contains the actual implementation. I proposed to leverage using h2 database and use JDBC API for querying with h2 database in eventmesh.

The followings are high-level design diagram: image An example of Backward Compatibility from OpenSchema Specification image

xwm1992 commented 3 years ago

@yzhao244 Would you like to share some new designs for us? thanks.

What is the purpose of the design:

The purpose of the design is for introducing Schema Registry as part of the EventMesh. The Schema Registry is a central repository with RESTful interfaces for developers to define and register standard schemas. Addresses the problem of different data(event) format of producer and consumer.

What are the features to provide from Schema Registry:

  1. Persist and share version history of all schemas(schemas lifecycle management) and verify schema compatibility.
  2. Supports Avro, JSON, and Protobuf formats serialization/deserialization

What are the high level design to achieve the features

  1. Defines schema registry data models(subject, schema, version, compatibility) and schema REST API standards based on the open-source OpenSchema specifications.
  2. The eventmesh-database-api abstract module abstracts the CRUD capability of the schema registry into this module.
  3. Eventmesh-database-h2 contains the actual implementation. I proposed to leverage using h2 database and use JDBC API for querying with h2 database in eventmesh.

The followings are high-level design diagram: image An example of Backward Compatibility from OpenSchema Specification image

Hi @yzhao244 , I have some doubts about this design.

yzhao244 commented 3 years ago

@xwm1992 Thanks for the questions. :) . The followings are replies.. Sorry about my drawings are a bit rough. :)

image

POST /subjects/(string: subject)/ POST /subjects/(string: subject)/versions POST /compatibility/subjects/(string: subject)/versions/(version: version)

GET /subjects GET /subjects/(string: subject) GET /subjects/(string: subject)/versions GET /subjects/(string: subject)/versions/(version: version)/schema GET /schemas/ids/{string: id} GET /schemas/ids/{string: id}/subjects GET /config/(string: subject)

PUT /config/(string: subject)

DELETE /subjects/(string: subject)/versions/(version: version) DELETE /subjects/(string: subject)

yzhao244 commented 3 years ago

Furthermore, currently, the project does not have a layer which exposes API which follow the REST best practice. I would like to also propose another module something called "eventmesh-rest" which can expose EventMesh Schema Registry APIs by following the OpenSchema restful APIs standards as shown above.

qqeasonchen commented 3 years ago

Furthermore, currently, the project does not have a layer which exposes API which follow the REST best practice. I would like to also propose another module something called "eventmesh-rest" which can expose EventMesh Schema Registry APIs by following the OpenSchema restful APIs standards as shown above.

sure ok.

jzhou59 commented 3 years ago

Hi, I'm also interested in schema registry in EventMesh. Thanks for your design and explanations. Now I get that :

Also, I have some questions:

Am I understanding it in the right way? Next I will go through the codes of both develop-branch and PR#434, hope I could contribute to it.

ruanwenjun commented 3 years ago

@yzhao244 Hi, I have a question, the h2 is a memory database, it seems doesn't support distributed, how can the different eventmesh-runtime sync the schema change?

qqeasonchen commented 3 years ago

@yzhao244 Hi, I have a question, the h2 is a memory database, it seems doesn't support distributed, how can the different eventmesh-runtime sync the schema change?

good question, maybe we need to make the schema work flow clear

yzhao244 commented 3 years ago

@ruanwenjun @qqeasonchen Hi guys, I am thinking it is better delivering OpenSchema Integration in an incremental delivery fashion in order to ensure a safe build. :) .. In total, OpenSchema APIs can be seen as three groups.. /subject/ related APIs, /schema/ related APIs, /config/compatibility related APIs which I would suggest to deliver each group by individually separated PRs. The PR 434 currently delivers /subject/ related APIs.

yzhao244 commented 3 years ago

Hi, I'm also interested in schema registry in EventMesh. Thanks for your design and explanations. Now I get that :

  • schema represents the format of transferring messages
  • the benefit of integrating OpenSchema in EventMesh lies in that consumer could dynamically parsing any message as long as consumer can find schema id in h2 database.

Also, I have some questions:

  • in upper design, does EventMesh Schema Registry needs a separate server to run? or it could run inside EventMesh Runtime?
  • is the scope of Schema lies in content of message or the whole message?

Am I understanding it in the right way? Next I will go through the codes of both develop-branch and PR#434, hope I could contribute to it.

Thanks for your participation. :) .. Yes, your understanding is correct. Schema Registry APIs are part of admin APIs so yes it can be run as part of EventMesh-runtime. The scope of schema is for ensuring the consistency and compatibility of exchanging events between event producer and event consumer.

qqeasonchen commented 3 years ago

@yzhao244 sorry, after discuss with community, Schema Registry needs a separate server to run, eventmesh runtime query and cache schema, and then used to check schema, producer and consumer do not need to interact with schemaRegistry, what do you think of this? @JunjieChou also do the design now.

jzhou59 commented 3 years ago

@qqeasonchen @yzhao244 Hi, guys. Below is a high-level design of Schema Registry and EventMesh-Schema-SPI. It decouples Schema Registry as a separated runtime which currently is one host running schema registry. How do you think of this design?

EventMeshSchemaRegistryArchitecture_2nd_edition
jinrongluo commented 3 years ago

@JunjieChou @qqeasonchen @yzhao244

Hi Junjie, Thanks for your proposal on Schema Registry design. I agree overall design and the example steps of how eventmesh are using Schema Registry to process the events. I have two comments below:

  1. Schema Registry API is part of EventMesh administrative API, In the future we can have other admin APIs such as Topic API, and subscription API. See issue #346, and issue #349 All these admin APIs can be group into a new module of eventmesh: eventmesh-rest module. This module will be running as part of eventmesh runtime. And this module includes the Schema Registry Runtime in @JunjieChou 's design. See issue #435

Also, It is much lightweight to run Schema Registry Runtime as part of EventMesh runtime process. Deployment and service upgrade only deal with single runtime process.

When scaling up the EventMesh runtime instances, Schema Registry Runtime will scale up along with it. it provides high availability. Since Schema APIs are stateless, we can scale up Schema Registry Runtime.

Thus, from the perspectives of extensibility, deployment maintenance, and high availability, I would say running Schema Registry as part of EventMesh Runtime process.

  1. For database, I would say it is not dedicated to Schema Registry. In the future it can be used to store other EventMesh assets, such as topics and subscriptions. see issue #349
qqeasonchen commented 3 years ago

@jinrongluo @yzhao244 @JunjieChou hi,Here is the different, schema registry runs dependently or along with Eventmesh runtime? disscusson is open here. I agree with setting up eventmesh-rest and eventmesh-store.

jzhou59 commented 3 years ago

@qqeasonchen @jinrongluo @yzhao244

agreement

Hi, Jinrong, I get what you mean. And I believe you are right considering scaling. The model you proposed is integrating Schema Registry API with EventMesh(eventmesh-rest), in which model there is no client and server because the eventmesh-rest undertakes the interaction with the database. And the database is independent of EventMesh so that other assets may also be stored.

question

So here comes another question, which database is suitable for this situation? h2-database is a memory database that is fast. Traditional Relational Database stores persistent data.

a new concern

Besides, I reconsider the steps which contain an unreasonable step(preparation). In step preparation, I assume that schema and serialization type is set first. However, serialization type may differ among events(with the same subject/topic) created by different producers which is actually the necessity that Schema Registry should exist. So I think schema and serialization type should be decided by producers rather than EventMesh. What do you think of this one?

jzhou59 commented 3 years ago

Anyway, the question is not an urgent one. But the concern may be a have-to-solved one before coding. What do you guys think?

jinrongluo commented 3 years ago

@JunjieChou @qqeasonchen @yzhao244

Thank you JunJie for your review and analysis.

For database question. I would say the choice of database is depend on the deployment environment. For dev/test environment, where only single instance of EventMesh is provisioned, H2 database is sufficient. For Staging/Production environment, distributed database (such as MySQL) is required. So we can have eventmesh-store-plugin module which allows cloud vendors to have their own database plugin as the persistence layer for EventMesh. We can provide MySQL implementation as the reference in this opensource project.

For schema and (de-)serialization, I would suggest this is done in the EventMesh SDK side. Producer and Consumer can use EventMesh SDK to (de-)serialize their event using their own schema type.

I also love to hear other suggestions. :)

jzhou59 commented 3 years ago

@jinrongluo @qqeasonchen @yzhao244 hey, guys. After discussing with Weiming, I found that my understandings of some terms is not correct which makes my design a bit confusing. I get your points which is exactly what I thought and I will return with a new design picture. Sorry to make these confuses.

jzhou59 commented 3 years ago

Hi, @qqeasonchen @jinrongluo @yzhao244. After comparing schema integration in other projects(Kafka, EMQ, Pulsar), I propose to separate OpenSchema into two parts. One is server-side (OpenSchema Registry) which provides storing and maintaining schema services, another is client-side which provides (de-)serialization and validation services. I have defined the client-side architecture and interfaces according to it and created a pr #498 .

qqeasonchen commented 3 years ago

@JunjieChou Nice

jzhou59 commented 2 years ago

Hi, I would like to continue on this issue and have created a pr in #821. However, @ruanwenjun and I seem to have a different understanding of the responsibilities of the openschema plugin. I was trying to use the plugin for interacting with openschema implementations such as SchemaRegistry, including registering schemas, retrieving schemas, and so on. @ruanwenjun suggests providing serialization, deserialization, and validation for the different message types. What do you think is needed when integrating with openschema? @qqeasonchen

ruanwenjun commented 2 years ago

Hi, I would like to continue on this issue and have created a pr in #821. However, @ruanwenjun and I seem to have a different understanding of the responsibilities of the openschema plugin. I was trying to use the plugin for interacting with openschema implementations such as SchemaRegistry, including registering schemas, retrieving schemas, and so on. @ruanwenjun suggests providing serialization, deserialization, and validation for the different message types. What do you think is needed when integrating with openschema? @qqeasonchen

I read the doc of SchemaRegistry, in short, it's a server which provide some interface to CRUD schema. This seems too weighty to me, our goal is simply to use the OpenSchema specification. As for the storage of Schema, it is better not to rely on additional web service, just store with our metadata is good enough.

BTY, I don't recommend designing OpenSchema as a plugin, this is over design, at least, I don't think we will integrate with other schema specification.

jzhou59 commented 2 years ago

@ruanwenjun Ok, I see. I am gonna close the pr #821 and add a new module under eventmesh-admin.