Enable Automatic Schema Retrieval in DDL for Kafka Sources Using Confluent Schema Registry

When creating a new source connection through the web UI and selecting Avro as the data format with Confluent Schema Registry as the schema type, users can omit specifying the schema, as it is automatically loaded from the Confluent Schema Registry.

However, when defining a source using DDL within a pipeline, it currently requires explicit schema definition. For instance, the following DDL statement:

CREATE TABLE my_kafka_source WITH (
    'connector' = 'kafka',
    'avro.confluent_schema_registry' = 'true',
    'bootstrap_servers' = 'my_server',
    'schema_registry.endpoint' = 'my_endpoint',
    'type' = 'source',
    'topic' = 'my_topic',
    'bad_data': 'drop',
    'source.offset': 'latest',
    'source.read_mode': 'read_committed',
    'sink.commit_mode' = 'at_least_once',
    'format' = 'avro'
);

leads to an error when subsequently trying to query the table:

SELECT my_field FROM my_kafka_source;

Error: Schema error: No field named my_field.

It would be nice if ad-hoc DDLs inside pipeline definition could support automatic schema retrieval from the Confluent Schema Registry, similar to the functionality available in the web UI.

ArroyoSystems / arroyo

Enable Automatic Schema Retrieval in DDL for Kafka Sources Using Confluent Schema Registry #692