etf1 / kafka-message-scheduler

scheduler for kafka messages
MIT License
76 stars 14 forks source link

Scheduler converts null target keys to empty strings #24

Closed james-johnston-thumbtack closed 2 years ago

james-johnston-thumbtack commented 2 years ago

Kafka header values can be null, which is distinct from an empty string. TF1 scheduler does not notice this difference, and inadvertently converts null values to empty strings when examining the target key header:

The TargetKey function at https://github.com/etf1/kafka-message-scheduler/blob/55c951c3985da3be1cd7422e8b897ee8eb491f29/schedule/kafka/schedule.go#L40-L42 calls getHeaderValue at https://github.com/etf1/kafka-message-scheduler/blob/55c951c3985da3be1cd7422e8b897ee8eb491f29/schedule/kafka/schedule.go#L23-L30 When getHeaderValue sees that the length of the slice is 0, it just returns an empty string. It does not check and differentiate as to whether the slice was equal to nil, or if it was non-nil and had a zero length. (See https://stackoverflow.com/a/45918994 )

If the s.Headers[i].Value is nil then the TargetKey will return an empty string instead of nil. When the message is later produced at https://github.com/etf1/kafka-message-scheduler/blob/55c951c3985da3be1cd7422e8b897ee8eb491f29/runner/kafka/handler.go#L196 the destination key is now an empty string instead of null. This makes a difference, because Kafka keys can also be null vs zero length, and this is more significant: if the murmor2_random partitioner (the Java default) is used, then the null keys will go to a random partition, but the zero length ones go to the same partition, which affects scalability. The end user may also simply need to preserve this subtle difference.

You can see how the Confluent producer treats message key field differently based on if null or zero length: https://github.com/confluentinc/confluent-kafka-go/blob/1d5067b0f4f88f4167cb66a78ab4c4f3ec3b5786/kafka/producer.go#L194-L205

The solution might be to modify TargetKey to return []byte instead of a string, and simply pass through the header value as-is without any intermediate string conversions.

fkarakas commented 2 years ago

Hello Sir, thank you for this very detailed issue. At first glance it makes sense, and it can be done.

Could you tell us if/how you plan to use the scheduler ?

Regards,

james-johnston-thumbtack commented 2 years ago

Could you tell us if/how you plan to use the scheduler ?

In general, we want to use it for scheduling various events to produce in the future throughout our product. I am on an infrastructure team that is looking into setting this up for other product teams to utilize. Not every use case currently using Kafka (for normal, non-scheduled messages) specifies a message key, so I would expect that some product teams will want to try to schedule messages without keys as well, since that is normally something that Kafka supports.