confluentinc / confluent-kafka-python

Confluent's Kafka Python Client
http://docs.confluent.io/current/clients/confluent-kafka-python
Other
140 stars 898 forks source link

why does Confluent-kafka does not send more than 5k messages with 1MB of payload ? #1079

Closed AkshayAwate closed 8 months ago

AkshayAwate commented 3 years ago

Description

I am doing some tests, first i sent 5 messages with payload of 1MB, then 50, 500 with same payload, it works well. But when i send 5k messages it throws error as :


`%5|1617340039.942|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out ProduceRequest in flight (after 950ms, timeout #0)
%4|1617340039.942|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out 1 in-flight, 0 retry-queued, 1 out-queue, 1 partially-sent requests
%3|1617340039.942|FAIL|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: 2 request(s) timed out: disconnect (after 302251ms in state UP)
%5|1617340040.949|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out ProduceRequest in flight (after 979ms, timeout #0)
%4|1617340040.949|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out 1 in-flight, 0 retry-queued, 1 out-queue, 1 partially-sent requests
%3|1617340040.949|FAIL|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: 2 request(s) timed out: disconnect (after 1001ms in state UP, 1 identical error(s) suppressed)
Processed 5000 messsages in 303.98 seconds
15.69 MB/s
16.45 Msgs/s
%4|1617340041.663|TERMINATE|rdkafka#producer-1| [thrd:app]: Producer terminating with 897 messages (1100181264 bytes) still in queue or transit: use flush() to wait for outstanding message delivery`

My configs:

`p = Producer({'bootstrap.servers': '10.x.x.x:19092',
    'sasl.username': kafka_user, 'compression.codec':'snappy',
    'sasl.password': kafka_password, 'sasl.mechanisms':'PLAIN', 'security.protocol': 'SASL_PLAINTEXT',
    'message.max.bytes':'1000000000', 'queue.buffering.max.messages': '10000000', 'message.max.bytes' :'1000000000',
    'queue.buffering.max.kbytes': '2147483647', 'queue.buffering.max.ms' : '500', 'queue.buffering.max.messages':'10000000'})`

I have read max message payload can be 1MB, how to proceed for larger payload ? is there anything Iam missing ?

How to reproduce

Checklist

Please provide the following information:

edenhill commented 3 years ago

It seems odd that the ProduceRequest are timing out after only one second, are you sure that you are not setting request.timeout.ms or message.timeout.ms?

AkshayAwate commented 3 years ago

@edenhill NO, I am not using request.timeout.ms or message.timeout.ms in my configs.

edenhill commented 3 years ago

Ah, I think I see what is going on. Your produce rate is too high for the network/cluster causing messages to be queued in the client and when they're eventually transmitted their timeout might be so low that the message times out while in flight to the broker. Your run lasts for 303 seconds, the default message.timeout.ms is 300s, so that sort of makes sense.

If you reduce the producer queue size you will get quicker back pressure (produce() will raise QUEUE_FULL) and you can stop producing until there is room in the queue.

AkshayAwate commented 3 years ago

@edenhill okay, i will try with linger_ms=5 ?

edenhill commented 3 years ago

No, rather limit queue.buffering.max.kbytes and queue.buffering.max.messages to only allow for say 60 seconds worth of messages. e.g., if your input rate is 1000 messages per second, set queue.buffering.max.messages to 60000.

AkshayAwate commented 3 years ago

@edenhill I will try and update.

AkshayAwate commented 3 years ago

@edenhill so main thing is i am using image bytes as payload.

Ypsingh18 commented 3 years ago

Just wondering was the issue resolved, because I am facing something similar and wanted to ask if you ever found a resolution to this issue, thanks!

adrianguyareach commented 3 years ago

having the same problem. was this solved?

edenhill commented 3 years ago

See my previous comments on setting queue sizes.

arizalpratama commented 2 years ago

image I have same issue.. how to solve this?

pranavrth commented 8 months ago

The main question is already answered by @edenhill. Closing this issue.