apache / rocketmq-streams

Apache rocketmq
https://rocketmq.apache.org/
Apache License 2.0
171 stars 82 forks source link

Support data deserialize with schema for RocketMQSource #205

Closed MatrixHB closed 2 years ago

MatrixHB commented 2 years ago

The present situation of RocketMQSource:

The message is parsed into a string and then converted into a JsonObject, which is used as the basic data structure for subsequent operators processing.

Disadvantages of present design:

  1. Converting a body into a String will restrict the use scenarios and will also be incompatible with the RocketMQ Schema. After client using RocketMQ Schema, not all bodies can be directly converted into a String or JSON, for example, some messages are serialized in Avro format.

  2. At present, when using rocketmq-streams, operations in the form of Java objects are not supported. Operators such as map, flatMap and filter need to explicitly convert into JsonObject, which is very unfriendly.

  3. In RsqlDB, the body format of each message is required to correspond to the Table structure strictly according to the field order and separated by commas, as follows. This invisible requirement is very unfriendly.

    1,2,3,4 #message1
    2,2,3,4 #message2
    3,2,3,4 #message3
    4,2,3,4 #message4

Steps to complete this issue

  1. All types of bodies can be converted into JsonObject or UserDefinedMessage according to the RocketMQ Schema, and subsequent window operators still use JsonObject as the basic data structure. Scenarios are supported where schema registry components are not deployed.

  2. The message content for rsqldb can be based on the serialization of Java objects, rather than the comma separated format of strictly ordered field values.

  3. Subsequent window operators can be processed based on Java objects, rather than JsonObject. But this will take a long time and hard work, since the overall link relies heavily on JsonObject at present.