IBM / sarama

Sarama is a Go library for Apache Kafka.
MIT License
11.24k stars 1.73k forks source link

`Net.ReadTimeout` is not a network-level property #2820

Open leonidboykov opened 4 months ago

leonidboykov commented 4 months ago
Description

The description of the Net struct says that inner properties correspond to network-level configuration. However, Net.ReadTimeout is affected by some other timeouts, i.e. if Consumer.Group.Rebalance.Timeout is greater than Net.ReadTimeout the rebalance fails with i/o timeout error. Perhaps, Kafka does not reply immediately to the client.

At the moment, the only way to make other timeouts work is to set Net.ReadTimeout as the longest timeout. However, this may lead to problems with actual network issues because there is no actual network-level timeout.

Versions
Sarama Kafka Go
v1.41.3 3.1.0 1.21
Configuration
conf := sarama.NewConfig()
conf.Version = sarama.V3_3_1_0
conf.Consumer.Return.Errors = true
conf.Consumer.Offsets.Initial = sarama.OffsetOldest
conf.Consumer.MaxProcessingTime = 1 * time.Minute
conf.Consumer.Group.Rebalance.Timeout = 10 * time.Minute
conf.Consumer.Group.Session.Timeout = 10 * time.Minute
conf.Consumer.Group.Heartbeat.Interval = 5 * time.Second
conf.Net.ReadTimeout = 11 * time.Minute // Consumer.Group.Rebalance.Timeout with an extra 1 minute.
}
Additional Context

I've done some research, and it seems, other Sarama users are dealing with the same problem: https://github.com/IBM/sarama/issues/1422#issuecomment-517530147.

I'm not sure, but perhaps, this parameter wasn't tested properly. While default settings for sarama.Config set Consumer.Group.Rebalance.Timeout (60s) greater than Net.ReadTimeout (30s), the value Consumer.Group.Rebalance.Timeout is set to only 10 seconds in the corresponding test:

https://github.com/IBM/sarama/blob/3e385a677e5b0aaacc8ba0a56be18a53550275ca/functional_consumer_group_test.go#L396-L405

github-actions[bot] commented 1 month ago

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

leonidboykov commented 1 month ago

I'm pretty sure, the bug still exists in the main branch.