Describe the bug
The Avro schema supports default values of null. The syntax is "default": null. pulsar-admin accepts this syntax, but support for this syntax is lacking elsewhere in Pulsar, resulting in IncompatibleSchema exceptions between schemas that appear identical.
This ticket asks for improved logging for schema info objects that contain the "default": null specification.
To Reproduce
Steps to reproduce the behavior:
Using Pulsar 2.7.1, run bin/pulsar standalone
Configure schema compatibility policies on a namespace:
4. Two schema versions are uploaded because they are compatible. They are printed as the same, so it's impossible to see their difference after uploading them:
tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas get climate/field-service/actions --version 0
5. Using the Python client library, I found no way to produce a message using version 0 of the schema. Everything I tried resulted in an `IncompatibleSchema` exception.
class Action(Record):
action = String()
6. However, the Action class above works with version 1 of the schema, the one without `"\default\":null` specified.
**Expected behavior**
The two schemas are _different_, so they should not be printed as _identical_. In this case, the `"default":null` should be printed when calling `bin/pulsar-admin schemas get climate/field-service/actions --version 0`.
Further, there should be a way to construct a Record class using the Python client library, so an event can be written to a topic with a schema containing `"default":null`.
**Screenshots**
N/A.
**Desktop (please complete the following information):**
- OS: MacOS Catalina Version 10.15.17
**Additional context**
`"default":null` seems like a common default value to specify in Avro schemas. The `IncompatibleSchema` exception that it causes complicated efforts to triage mistakes and bugs that resulted in `IncompatibleSchema`. Bug tickets whose triage was significantly complicated due to the presence of `"default:null`: https://github.com/apache/pulsar/issues/9571, https://github.com/apache/pulsar/issues/8510.
The overall impact is that Avro schema support seems quite broken in Pulsar. There were questions on whether Kafka's Avro schema support is this buggy. If we had still been deciding between Kafka and Pulsar, this may have changed our decision.
Another solution is to create a new doc page for Pulsar's Avro support. On that doc page, known limitations of Pulsar's Avro support should be documented. Sample text for this problem (it might not be correct, but it would help anyone experimenting with Avro support in Pulsar):
Pulsar implements support a subset of Avro schemas.
Pulsar does not support "default":null for string fields.
To specify a default value of null for a string field, simply omit that clause.
This is because for string fields without default values, Pulsar consumers will default these fields to null and auto-convert null into the empty string for consumers.
Describe the bug The Avro schema supports default values of null. The syntax is
"default": null
. pulsar-admin accepts this syntax, but support for this syntax is lacking elsewhere in Pulsar, resulting in IncompatibleSchema exceptions between schemas that appear identical.This ticket asks for improved logging for schema info objects that contain the
"default": null
specification.To Reproduce Steps to reproduce the behavior:
bin/pulsar standalone
bin/pulsar-admin namespaces set-schema-compatibility-strategy climate/field-service --compatibility FORWARD_TRANSITIVE
bin/pulsar-admin namespaces set-schema-validation-enforce --enable climate/field-service
bin/pulsar-admin namespaces set-schema-autoupdate-strategy climate/field-service --disabled
// ActionV0.schema { "type": "AVRO", "schema": "{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"],\"default\":null}]}", "properties": {} }
// ActionV1.schema { "type": "AVRO", "schema": "{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"]}]}", "properties": {} }
tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV0.schema climate/field-service/actions tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV1.schema climate/field-service/actions
tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas get climate/field-service/actions --version 0
{ "name": "actions", "schema": { "name": "Action", "type": "record", "fields": [ { "name": "action", "type": [ "null", "string" ] } ] }, "type": "AVRO", "properties": {} } tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas get climate/field-service/actions --version 1
{ "name": "actions", "schema": { "name": "Action", "type": "record", "fields": [ { "name": "action", "type": [ "null", "string" ] } ] }, "type": "AVRO", "properties": {} }
class Action(Record): action = String()
Pulsar implements support a subset of Avro schemas.
Pulsar does not support
"default":null
for string fields.To specify a default value of null for a string field, simply omit that clause.
This is because for string fields without default values, Pulsar consumers will default these fields to null and auto-convert null into the empty string for consumers.