confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
125 stars 1.04k forks source link

VALUE_FORMAT does not support non-delimited datum, e.g. binary or a text string #1438

Open Downchuck opened 6 years ago

Downchuck commented 6 years ago

Currently value_format is required as JSON, AVRO or DELIMITED.

If the content is simply a single value (such as a string), then DELIMITED is not quite the right use, as it really is more OPAQUE, there is no delimiter. This also relates to records where the value is simply a blob (byte) https://github.com/confluentinc/ksql/issues/1282

This case comes up when the value is a string or a byte string, and the record data is stashed in the key -- a common situation for some log recording methods. EXTRACTJSON works for the key, but there appears to be no feasible way to access the value without corrupting the data (such as using delimited).

rmoff commented 6 years ago

@Downchuck thanks for logging this issue. Do you have an example of the kind of data this would apply to? You mentioned log recording.

Downchuck commented 6 years ago

@rmoff We have binary data stashed in the value and our key is JSON metadata we use to stitch the binary bits back together.

In this use case, we are trying to grab data from that Kafka topic, but we are also joining it to another Kafka topic with additional metadata. We have not been able to grab out the VALUE unmolested.

archy-bold commented 4 years ago

I've managed to get this to work by using VALUE_FORMAT="KAFKA", which I think is intended for keys mostly, but works to read a full row as a string, I can't vouch for bytes, however. https://github.com/confluentinc/ksql/issues/5348#issuecomment-717214150