confluentinc / confluent-kafka-javascript

Confluent's Apache Kafka JavaScript client
https://www.npmjs.com/package/@confluentinc/kafka-javascript
MIT License
92 stars 5 forks source link

fix: update message header type to match #39

Open mkaufmaner opened 1 month ago

mkaufmaner commented 1 month ago

Cleans up the existing message headers type.

cla-assistant[bot] commented 1 month ago

CLA assistant check
All committers have signed the CLA.

milindl commented 1 month ago

Thanks for the PR. The intent of this is to allow number or nulls to be passed onto the produce headers. Is that correct?

Two things:

  1. Could you update the kafkajs.d.ts, too, to allow numbers and nulls? The interface IHeaders will need to be updated.

  2. While reviewing this PR, I realized that anything except a string is not correctly supported, since we're converting the bytes passed from JavaScript to a utf8 string before sending. If you send, for instance, a buffer with 0x00 somewhere within it, the library truncates the buffer right there. Similarly, if you pass null, the library creates the header with the string value "null". I'll fix it in a bit. I will maintain the current behaviour for string and numbers (since people might be relying on the conversion of numbers to strings), and change it to allow Buffers and nulls properly (which are somewhat unusable in the current state). There's another issue where if number of keys of MessageHeader is more than 1, only 1 key is used. I don't think this blocks the PR, I'm just posting this so you can keep it in mind while using it.

mkaufmaner commented 1 month ago

Thanks for the PR. The intent of this is to allow number or nulls to be passed onto the produce headers. Is that correct?

@milindl Partially correct. Yes to add support for nulls and numbers as well as cleaning up the typings.

  1. Could you update the kafkajs.d.ts, too, to allow numbers and nulls? The interface IHeaders will need to be updated.

@milindl Done.

  1. While reviewing this PR, I realized that anything except a string is not correctly supported, since we're converting the bytes passed from JavaScript to a utf8 string before sending. If you send, for instance, a buffer with 0x00 somewhere within it, the library truncates the buffer right there. Similarly, if you pass null, the library creates the header with the string value "null".

I can confirm and I have replicated this rather unexpected behavior.

I'll fix it in a bit. I will maintain the current behaviour for string and numbers (since people might be relying on the conversion of numbers to strings), and change it to allow Buffers and nulls properly (which are somewhat unusable in the current state).

Thank you!

If I may make a recommendation, since this project is pre-GA, I would make all the breaking changes necessary to align with what would be considered expected behavior. In my opinion, the conversion of a primitive, like a number to a string, would not be considered expected behavior.

There's another issue where if number of keys of MessageHeader is more than 1, only 1 key is used.

I feel as if there is a larger discussion that needs to be had around how headers are implemented. I would honestly like to see headers passed as a plain object or map instead of an array of plain objects with a single property. I am recommending this because of both syntax and performance. The performance of implementations performing lookup operations on an array of headers have a time complexity of O(N). However, if the headers were an object then the complexity can be reduced to O(1).

Preferably I would like to see;

export type MessageHeaderValue = Buffer | string | number | null;
export type MessageHeader = Record<string, MessageHeaderValue>;
export type MessageHeaders = MessageHeader | null | undefined;
milindl commented 1 month ago

Thanks for the PR update. I've been thinking about the conversion thing.

The Kafka protocol expects header values to be []bytes. Breaking a number into bytes might pose a slight issue for endianness, and since type information isn't passed within the headers, the receiver will only ever get bytes and will need to write code outside of the library to change it back to a number.

Maybe we should remove the number type entirely, and leave the encoding of number -> []byte to the user too. Since we can make API changes, we should probably make use of this opportunity for number. Ideally, only the Buffer type should be permitted as the header value, but I think it's very reasonable that a string can be turned into a utf8 []bytes by the library.

As for making the header an object, the keys are allowed to be repeated by the Kafka protocol. The type used by the kafkajs.d.ts is probably a good compromise, allowing key->[value] in the object that is passed.

mkaufmaner commented 1 month ago

@milindl

The Kafka protocol expects header values to be []bytes. Breaking a number into bytes might pose a slight issue for endianness, and since type information isn't passed within the headers, the receiver will only ever get bytes and will need to write code outside of the library to change it back to a number.

Maybe we should remove the number type entirely, and leave the encoding of number -> []byte to the user too. Since we can make API changes, we should probably make use of this opportunity for number. Ideally, only the Buffer type should be permitted as the header value, but I think it's very reasonable that a string can be turned into a utf8 []bytes by the library.

As for the the number type, I think removing the number type would be appropriate and leaving the string type in place would be for the best. I also agree, that an ideal solution would be header values are only buffers. However, I also believe that only allowing buffers would be detrimental to developer experience.

As for making the header an object, the keys are allowed to be repeated by the Kafka protocol. The type used by the kafkajs.d.ts is probably a good compromise, allowing key->[value] in the object that is passed.

That would be a great compromise and the functionality to support both shouldn't be too complicated to implement.