IBM / sarama

Sarama is a Go library for Apache Kafka.
MIT License
11.44k stars 1.75k forks source link

unexpected errors: Request exceeded the user-specified time limit in the request. #1420

Closed bgardiner closed 4 years ago

bgardiner commented 5 years ago
Versions

Sarama Version: v1.22.1 Kafka Version: 1.0.1 Go Version: 1.12

Configuration

Sarama:

kafkaConfig := sarama.NewConfig()
kafkaConfig.Version = sarama.V1_0_0_0
kafkaConfig.Consumer.Return.Errors = true
kafkaConfig.Consumer.Offsets.Initial = sarama.OffsetOldest
kafkaConfig.Consumer.Offsets.Retention = 7 * 24 * time.Hour

Kafka: default settings with the following explicitly configured:

KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
KAFKA_DELETE_TOPIC_ENABLE=true
KAFKA_AUTO_LEADER_REBALANCE_ENABLE=true
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=3
Logs

N/A

Problem Description

I have set kafkaConfig.Consumer.Return.Errors = true and on some of my consumer.Errors() channels I routinely (on the order of a few times per hour) see kafka: error while consuming <topic>/<partition>: kafka server: Request exceeded the user-specified time limit in the request. I have looked through the code base to try and figure out why/where sarama.ErrRequestTimedOut occurs, but I did not come up with anything. What causes a sarama.ConsumerGroup to get this error? Are there any configuration changes I can make to avoid these errors?

bgardiner commented 5 years ago

I am also seeing quite a number of write: broken pipe errors. For example, one instance I found details about in the Sarama logs shows consumer/broker/2 disconnecting due to error processing FetchRequest: write tcp X.X.X.X:XXX->X.X.X.X:XXX: write: broken pipe. I have subsequently tried setting kafkaConfig.Net.KeepAlive = 5 * time.Minute on the assumption that something in my infrasturcture (k8s or load balancers) might be terminating TCP connections. However, I am still seeing these errors. Any insight as to the possible cause or what other configuration changes I could make to try and mitigate these errors as well?

ghost commented 4 years ago

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

wblakecaldwell commented 4 years ago

@bgardiner the errors prefixed with "kafka server" are interpreted from the Kafka server error codes, mentioned here: https://kafka.apache.org/protocol.

That's why you can't find that error being returned in the code. The server returns error code 7, which Sarama interprets here: https://github.com/Shopify/sarama/blob/9501120220783df659f2e405635a20cacc3655e0/errors.go#L132

and converts it to this human-readable error string based on that lookup: https://github.com/Shopify/sarama/blob/9501120220783df659f2e405635a20cacc3655e0/errors.go#L230-L231