Closed james-johnston-thumbtack closed 2 years ago
Hello Sir, thank you for this very detailed issue. At first glance it makes sense, and it can be done.
Could you tell us if/how you plan to use the scheduler ?
Regards,
Could you tell us if/how you plan to use the scheduler ?
In general, we want to use it for scheduling various events to produce in the future throughout our product. I am on an infrastructure team that is looking into setting this up for other product teams to utilize. Not every use case currently using Kafka (for normal, non-scheduled messages) specifies a message key, so I would expect that some product teams will want to try to schedule messages without keys as well, since that is normally something that Kafka supports.
Kafka header values can be null, which is distinct from an empty string. TF1 scheduler does not notice this difference, and inadvertently converts null values to empty strings when examining the target key header:
The
TargetKey
function at https://github.com/etf1/kafka-message-scheduler/blob/55c951c3985da3be1cd7422e8b897ee8eb491f29/schedule/kafka/schedule.go#L40-L42 callsgetHeaderValue
at https://github.com/etf1/kafka-message-scheduler/blob/55c951c3985da3be1cd7422e8b897ee8eb491f29/schedule/kafka/schedule.go#L23-L30 WhengetHeaderValue
sees that the length of the slice is0
, it just returns an empty string. It does not check and differentiate as to whether the slice was equal tonil
, or if it was non-nil and had a zero length. (See https://stackoverflow.com/a/45918994 )If the
s.Headers[i].Value
isnil
then theTargetKey
will return an empty string instead of nil. When the message is later produced at https://github.com/etf1/kafka-message-scheduler/blob/55c951c3985da3be1cd7422e8b897ee8eb491f29/runner/kafka/handler.go#L196 the destination key is now an empty string instead of null. This makes a difference, because Kafka keys can also be null vs zero length, and this is more significant: if themurmor2_random
partitioner (the Java default) is used, then the null keys will go to a random partition, but the zero length ones go to the same partition, which affects scalability. The end user may also simply need to preserve this subtle difference.You can see how the Confluent producer treats message key field differently based on if null or zero length: https://github.com/confluentinc/confluent-kafka-go/blob/1d5067b0f4f88f4167cb66a78ab4c4f3ec3b5786/kafka/producer.go#L194-L205
The solution might be to modify
TargetKey
to return[]byte
instead of a string, and simply pass through the header value as-is without any intermediate string conversions.