Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.64k stars 2.84k forks source link

[Azure Event Hub] Event Message with Partition Key is Null from kafkacat #18460

Closed chongzhang closed 3 years ago

chongzhang commented 3 years ago

Describe the bug After using azure-eventhub package to produce message with partitionKey, using azure-eventhub package to consume the message shows the body and partition_key in the event. But using kafkacat to consume the topic shows null/empty key for the message.

To Reproduce Steps to reproduce the behavior:

  1. Use azure-eventhub to produce message:

event_data_batch_with_partition_key = producer.create_batch(partition_key='key1') event_data_batch_with_partition_key.add( EventData(msg)) producer.send_batch(event_data_batch_with_partition_key)

  1. use azure-eventhub to consume the message

async def on_event(partition_context, event): logging.info(f'event {event}') the consumer log shows the event message with body, partition_key, and other fields, e.g

event { body: '{"name": "myname", "data": "msg 18"}', offset: 133144046968, sequence_number: 66430, partition_key=b'key1', enqueued_time=datetime.datetime(2021, 4, 30, 19, 41, 19, 410000, tzinfo=datetime.timezone.utc) }

  1. use kafkacat to consume the topic:

kafkacat -b $BROKER -t $TOPIC -f '\n%t Key (%K bytes): %k :\nValue (%S bytes): %s\n%T \Partition: %p\tOffset: %o\n--\n' -o end

kafkacat consumes the msg with empty/null key:

mytopic Key (-1 bytes): : Value (36 bytes): {"name": "myname", "data": "msg 18"} 1619811679410 Partition: 1 Offset: 66430

  1. similar result with empty key by using kakfa library e.g https://github.com/Shopify/sarama

Expected behavior

  1. the msg partitionKey is visible and available for kafkacat (or other kafka consumers)

Screenshots

Additional context

rakshith91 commented 3 years ago

Thanks for reporting the bug!! We'll take a look asap

yunhaoling commented 3 years ago

@chongzhang , thanks for reaching out.

as far as I know, the partition_key is used by the service to decide partition, but I'm not sure whether or not the service would set the kafka message key using the partition_key.

hey @serkantkaraca , could you help give some more context on this one?

ghost commented 3 years ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl.

Issue Details
- **azure-eventhub**: - **5.4.0**: - **MacOS**: - **3.8.7**: **Describe the bug** After using azure-eventhub package to produce message with partitionKey, using azure-eventhub package to consume the message shows the body and partition_key in the event. But using kafkacat to consume the topic shows null/empty key for the message. **To Reproduce** Steps to reproduce the behavior: 1. Use azure-eventhub to produce message: ` event_data_batch_with_partition_key = producer.create_batch(partition_key='key1') event_data_batch_with_partition_key.add( EventData(msg)) producer.send_batch(event_data_batch_with_partition_key) ` 2. use azure-eventhub to consume the message ` async def on_event(partition_context, event): logging.info(f'event {event}') ` the consumer log shows the event message with body, partition_key, and other fields, e.g ` event { body: '{"name": "myname", "data": "msg 18"}', offset: 133144046968, sequence_number: 66430, partition_key=b'key1', enqueued_time=datetime.datetime(2021, 4, 30, 19, 41, 19, 410000, tzinfo=datetime.timezone.utc) } ` 3. use kafkacat to consume the topic: ` kafkacat -b $BROKER -t $TOPIC -f '\n%t Key (%K bytes): %k :\nValue (%S bytes): %s\n%T \Partition: %p\tOffset: %o\n--\n' -o end ` kafkacat consumes the msg with empty/null key: ` mytopic Key (-1 bytes): : Value (36 bytes): {"name": "myname", "data": "msg 18"} 1619811679410 Partition: 1 Offset: 66430 ` 4. similar result with empty key by using kakfa library e.g https://github.com/Shopify/sarama **Expected behavior** 1. the msg partitionKey is visible and available for kafkacat (or other kafka consumers) **Screenshots** **Additional context**
Author: chongzhang
Assignees: yunhaoling
Labels: `Client`, `Event Hubs`, `Messaging`, `Service Attention`, `customer-reported`, `question`
Milestone: [2021] May
chongzhang commented 3 years ago

@yunhaoling I did another test to use https://pypi.org/project/kafka-python/ 2.0.2 to produce msg with key: producer.send('mytopic', key='key1, value=msg)

for msg in consumer: logging.info('{} {} {} {}'.format(msg.partition, msg.offset, msg.key, msg.value)) the log shows the msg.key: 0 67212 b'key1' b'{"data": "msg 1"}'

event { body: '{"data": "msg 1"}', properties: {}, offset: 137438954224, sequence_number: 67212, enqueued_time=datetime.datetime(2021, 5, 6, 18, 1, 51, 674000, tzinfo=datetime.timezone.utc) }

yunhaoling commented 3 years ago

thanks for more information! yeah, this makes me further wonder whether the PartitionKey concept in Event Hub and Key concept in Kafka are the same thing -- apologize that I didn't have enough background to answer it.

I've looped in the service team to help answer your question.

chongzhang commented 3 years ago

Hi @yunhaoling, any update on this with service team?

serkantkaraca commented 3 years ago

Can you examine the messages with Service Bus Explorer and see if partition keys are present? Better to pinpoint whether the issue is on the producer side or the consumer side.

yunhaoling commented 3 years ago

I have tried the confluent-kafka python sdk to send and receive events by following azure-event-hubs-for-kafka python sample.

I'm using the following steps to check the behavior difference between kafka sdk and python event hub sdk:

Step1. check kafka producer and consumer behavior on message key

The confluent-kafka producer and consumer sample are good with message key -- I have tweaked the producer.py and consumer.py in the confluent-kafka sample to set and get message key.

# on the producer, produce message with key-value pair
p.produce(topic, key='partition key', value=str(i), callback=delivery_callback)

# on the consumer side, print out the key-value pari
print(msg.key())
print(msg.value())

Step2. Use ServiceBusExplorer to check whether partition key is populated.

No, the partition key is NOT showing up in the explorer

image

Step3. receive the events sent by kafka sdk by python eventhub sdk

the received event doesn't have a partition key -- the python eventhub sdk inspect the "x-opt-partition-key" entry in the internal amqp message annotation.

however, the internal amqp message annotation contains an entry "x-opt-kafka-key" and the value is exactly the key set by the kafka producer. image


@serkantkaraca , looks like "x-opt-partition-key" and "x-opt-kafka-key" are treated differently, is this a by-design difference?

chongzhang commented 3 years ago

Hi, is there any update on this?

yunhaoling commented 3 years ago

hey @chongzhang , confirmed with @serkantkaraca that x-opt-kafka-key (which is set when event is sent by kafka sdk and which is not set when event is sent by the Event Hub sdk) is not used as partition key in the Event Hubs Service, so it is an expected behavior.

hey @hmlam do you have any thoughts on this issue? is there anything we could or should do on the service/sdk side or we handover it to the kafka developer?

yunhaoling commented 3 years ago

hey @chongzhang, I have discussed with the service team. The summary is as follows:

Please let me know if you have any other questions and really appreciate your feedbacks!

chongzhang commented 3 years ago

@yunhaoling Thanks for your detail info.

Thanks again for your help!

serkantkaraca commented 3 years ago

You can print the partition key via message headers as below.

Formatting Headers: %h

Sample output Headers: x-opt-partition-key=�↑this-is-my-partition-key,

kasun04 commented 3 years ago

@chongzhang If we use the same key across multiple SDKs, messages go to the same partition as the hashing happens at the service side.

chongzhang commented 3 years ago

@kasun04 thanks! Just to clarify, what do you mean on "multiple SDKs"? I thought Adam @yunhaoling mentioned above that "because with the same value the Kafka client and the EH client likely send the message to two different partitions".

kasun04 commented 3 years ago

I was referring to using same partition keys from SDKs for different languages.

ghost commented 3 years ago

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!