MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

Specify keys and metadata when creating Kafka SINK #1576

Closed rjnn closed 7 months ago

rjnn commented 4 years ago

Details forthcoming

rjnn commented 4 years ago

Related to #1577, Kafka SINKS should also support the ability to specify which columns are put in the key field, and which columns are put in the value field. I suppose we also will need to support all envelope conventions (#1575) when creating Kafka sinks.

cc @JLDLaughlin

rjnn commented 4 years ago

I forgot to mention the metadata: we'll need to be able to also specify that some column should be written as Kafka message metadata.

wangandi commented 4 years ago

Given that this is a desired and prioritized feature, we may want to also consider enabling the creation of an upsert view on top of a source by specifying the key column.

Motivation: I realized that I would need a lot less external code to convert the MBTA stream into one that we can work with upsert if we can just stream in the raw json, extract the fields we need, and then specify the key to do upsert on. Once we have #1576, we'd be able to create a kafka stream sink with the right column as the key and relevant fields as payload, but to do analysis on the stream, we'd have to route the kafka stream back around into materialize. The user probably should not need to do a detour through kafka in order to use something as an upsert source.

ruchirK commented 4 years ago

Is this still something that we need to prioritize?

ruchirK commented 3 years ago

I'm going to unassign this from myself because I don't have any short / long term plans to work on this. Let me know if that needs to change.

benesch commented 7 months ago

Closing this as completed. Upsert-enveloped sinks allow users to choose the key columns (https://materialize.com/docs/sql/create-sink/kafka/#upsert-key-selection), and we have a different issue tracking the request to add headers to Kafka sinks: https://github.com/MaterializeInc/materialize/issues/10859.