Closed jaley closed 1 year ago
➤ Automation for Jira commented:
The link to the corresponding Jira issue is https://ably.atlassian.net/browse/SDK-3689
Initial thoughts on how to implement this:
routingKey
patterns
#{...}
denotes a substitution.
notation[ ... ]
notation, if neededchannelName
and message.*
, the root level objects need to be different in Kafka (as we're moving data in the opposite direction) and it makes sense to use Kafka nomenclature:
SinkRecord
itselftopic
and key
(without nesting), but this means the key
is already supported if it happens to be a String, or a byte array that we can interpret as string data.[ ... ]
notation as in routingKey
for this, but it could be added later if needed.SinkRecord
and this might be useful for channel mappings in some use cases? It'd be easy enough to support something like {topic.name}
and {topic.partition}
Recommendation:
Stringable
scalar to be something we can safely format into a message or channel name, including:
STRING
INT
of any precision, cast to stringBOOLEAN
cast to stringBYTES
interpreted as a UTF-8 stringmessage.name
or channel
key with a structured interpolation string pattern, that's an error and we should stop the sink task immediately to flag this, rather than waiting for data to arrive that we can't process.
topic
(including .name
and .partition
) and key
(if Stringable) can be used in interpolation without a schema registry#{key}
and #{topic}
:
topic
is an alias for #{topic.name}
and is always a stringkey
, if used this way, must be Stringable
#{topic.name}
and #{topic.partition}
#{key.*}
and #{value.*}
as root level objects to reference fields within records:
a.b
is used to access fields and implies a
must be a STRUCT
Stringable
Later:
[ ]
notation for element access#{value.myField | hash(10)}
hashing functionality in routingKey
?Limiting support to just STRUCT
and scalars for now means we can get away with a simple string split on the .
character and regexes. Map and array accessors and the hashing notation may be the point at which we need to embed a parser library, which is a bit more work.
Closed in #141
Currently it's only possible to refer to the Kafka topic and the record key in our
message.name
andchannel
configuration properties, by making use of#{topic}
or#{key}
in those patterns.This is quite limiting in common use cases, because:
#{topic}
substitution in either pattern is no more useful than a string constant.The channel and message name mapping feature is most useful when users can refer to fields within a record. This is possible when a schema registry and an appropriate data converter is configured and the incoming records will therefore have Kafka Connect Struct data. The Ably connector already supports this, as it's needed for automatic conversion of data to JSON before sending over a channel. If, however, the incoming data is unstructured, we can't do this.
We would need to require that any field referenced can reasonably be converted to a string (for use in either channel name or message name) and that in the case of channel name mappings, the resulting channel name adheres to Ably's channel naming restrictions.