apache / pulsar-client-node

Apache Pulsar NodeJS Client
https://pulsar.apache.org/
Apache License 2.0
148 stars 86 forks source link

Add method to get the schema (version) used for serialization of a message #362

Open acromarco opened 10 months ago

acromarco commented 10 months ago

There are client libraries for other languages (e.g. java) that support automatic (de)serialization of message data based on a schema (https://pulsar.apache.org/docs/3.1.x/schema-overview/). The node client does not have this feature yet (https://github.com/apache/pulsar-client-node/issues/242).

In theory it should be possible to deserialize the message data manually, BUT... Some serialization formats like Avro require to know the exact schema that was used for serialization (https://github.com/mtth/avsc/issues/447). In order to deal with schema evolution the client needs to know it's own compatible schema AND the schema used for serialization. As far I understand, the automatic (de)serialization feature of Pulsar solves this problem by keeping a schema registry and tagging the messages with the used schema version. If I understand right, the node client does not provide a method to get the schema used for serialization and not even a method to get the schema version of a message. Assuming the need for schema evolution, this makes it impossible to deserialize reliably messages written by a java client library using an Avro schema,

I'm new to Pulsar and Avro, so please forgive (and correct) me if my understanding is wrong.

If my understanding is right, I wonder how difficult it would be to add a method to lookup the serialization schema on an message.