AkshayAwate commented 3 years ago

Description

I am doing some tests, first i sent 5 messages with payload of 1MB, then 50, 500 with same payload, it works well. But when i send 5k messages it throws error as :


`%5|1617340039.942|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out ProduceRequest in flight (after 950ms, timeout #0)
%4|1617340039.942|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out 1 in-flight, 0 retry-queued, 1 out-queue, 1 partially-sent requests
%3|1617340039.942|FAIL|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: 2 request(s) timed out: disconnect (after 302251ms in state UP)
%5|1617340040.949|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out ProduceRequest in flight (after 979ms, timeout #0)
%4|1617340040.949|REQTMOUT|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: Timed out 1 in-flight, 0 retry-queued, 1 out-queue, 1 partially-sent requests
%3|1617340040.949|FAIL|rdkafka#producer-1| [thrd:sasl_plaintext://52.149.147.197:32266/2]: sasl_plaintext://52.149.147.197:32266/2: 2 request(s) timed out: disconnect (after 1001ms in state UP, 1 identical error(s) suppressed)
Processed 5000 messsages in 303.98 seconds
15.69 MB/s
16.45 Msgs/s
%4|1617340041.663|TERMINATE|rdkafka#producer-1| [thrd:app]: Producer terminating with 897 messages (1100181264 bytes) still in queue or transit: use flush() to wait for outstanding message delivery`

My configs:

`p = Producer({'bootstrap.servers': '10.x.x.x:19092',
    'sasl.username': kafka_user, 'compression.codec':'snappy',
    'sasl.password': kafka_password, 'sasl.mechanisms':'PLAIN', 'security.protocol': 'SASL_PLAINTEXT',
    'message.max.bytes':'1000000000', 'queue.buffering.max.messages': '10000000', 'message.max.bytes' :'1000000000',
    'queue.buffering.max.kbytes': '2147483647', 'queue.buffering.max.ms' : '500', 'queue.buffering.max.messages':'10000000'})`

I have read max message payload can be 1MB, how to proceed for larger payload ? is there anything Iam missing ?

How to reproduce

Checklist

Please provide the following information:

[ ] confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()): latest
[ ] Apache Kafka broker version:
[ ] Client configuration: {...}
[ ] Operating system: ubuntu 18.02
[ ] Provide client logs (with 'debug': '..' as necessary)
[ ] Provide broker log excerpts
[ ] Critical issue

edenhill commented 3 years ago

It seems odd that the ProduceRequest are timing out after only one second, are you sure that you are not setting request.timeout.ms or message.timeout.ms?

AkshayAwate commented 3 years ago

@edenhill NO, I am not using request.timeout.ms or message.timeout.ms in my configs.

edenhill commented 3 years ago

Ah, I think I see what is going on. Your produce rate is too high for the network/cluster causing messages to be queued in the client and when they're eventually transmitted their timeout might be so low that the message times out while in flight to the broker. Your run lasts for 303 seconds, the default message.timeout.ms is 300s, so that sort of makes sense.

If you reduce the producer queue size you will get quicker back pressure (produce() will raise QUEUE_FULL) and you can stop producing until there is room in the queue.

AkshayAwate commented 3 years ago

@edenhill okay, i will try with linger_ms=5 ?

edenhill commented 3 years ago

No, rather limit queue.buffering.max.kbytes and queue.buffering.max.messages to only allow for say 60 seconds worth of messages. e.g., if your input rate is 1000 messages per second, set queue.buffering.max.messages to 60000.

AkshayAwate commented 3 years ago

@edenhill I will try and update.

AkshayAwate commented 3 years ago

@edenhill so main thing is i am using image bytes as payload.