MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

Unable to create a sink over a view containing autogenerated column names #16130

Open philip-stoev opened 1 year ago

philip-stoev commented 1 year ago

What version of Materialize are you using?

v0.33.0-dev (74f61e77a)

How did you install Materialize?

Docker image

What is the issue?

If one attempts to create a sink over a view that uses implicit column names, the following error occurs:

executing query failed: db error: ERROR: error publishing kafka schemas for sink: unable to publish value schema to registry in kafka sink: Invalid schema {"type":"record","name":"envelope","fields":[{"name":"before","type":["null",{"type":"record","name":"row","fields":[{"name":"?column?","type":["null","int"]}]}]},{"name":"after","type":["null","row"]}]} with refs [] of type AVRO: ERROR: error publishing kafka schemas for sink: unable to publish value schema to registry in kafka sink: Invalid schema {"type":"record","name":"envelope","fields":[{"name":"before","type":["null",{"type":"record","name":"row","fields":[{"name":"?column?","type":["null","int"]}]}]},{"name":"after","type":["null","row"]}]} with refs [] of type AVRO

In this context, autogenerated names are column names like ?column? that Mz gives implicitly to columns containing an expression, e.g. 10/f1

To reproduce:

> CREATE TABLE t1 (f1 INTEGER);

> CREATE MATERIALIZED VIEW v1 AS SELECT 10 / f1 FROM t1;

> CREATE CONNECTION kafka_conn
  TO KAFKA (BROKER '${testdrive.kafka-addr}');

> CREATE CONNECTION IF NOT EXISTS csr_conn TO CONFLUENT SCHEMA REGISTRY (
    URL '${testdrive.schema-registry-url}'
  );

> CREATE SINK kafka_sink FROM v1
  INTO KAFKA CONNECTION kafka_conn (TOPIC 'testdrive-kafka-sink-${testdrive.seed}')
  FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_conn
  ENVELOPE DEBEZIUM
;

A work-around would be to name such column explicitly in the view , like this:

> CREATE MATERIALIZED VIEW v1 AS SELECT 10 / f1 AS c1 FROM t1;

however this requires dropping and re-creating the view, causing it to be recomputed from scratch.

Relevant log output

No response

benesch commented 1 year ago

A less bad feeling workaround is to create a non-materialized view on top of v1 that adds names, and then use that in the sink. No dropping of the original materialized view required!

chaas commented 1 year ago

A short term proposal: add a HINT for the above that suggests the user create the non-materialized view

Longer term (cc @ggnall): compute the set of unsupported characters/names that we support but AVRO does not, and either a) preemptively error rather than waiting for the kafka error b) provide an option (or by default) convert the column name to a supported autogenerated name