confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
128 stars 1.04k forks source link

`list topics;` takes 12 minutes to time out if KSQL is unable to connect to Kafka #3564

Open vcrfxia opened 5 years ago

vcrfxia commented 5 years ago

When executing list topics;, KSQL calls KafkaTopicClient#describeTopics() (link) which calls AdminClient#describeTopics() within a retry loop (link) that retries up to five times before giving up. The AdminClient has a default request timeout of 2 minutes (link) which means if KSQL is unable to connect to Kafka, list topics; takes 12 minutes to time out which is unreasonable from a user's perspective.

The AdminClient already has retry behavior built in (link) so there's no need to wrap AdminClient requests within KSQL. If desired, we can also configure request timeout from within KSQL by specifying a value in DescribeTopicsOptions (link) which gets used here in the AdminClient.

I suspect there are other KSQL requests (besides list topics;) that suffer from similar timeout behavior.

apurvam commented 5 years ago

I think most calls to the admin client are wrapped in a retry loop in KSQL. The original reason was that it reduces instability due to slowness communicating with Kafka. I agree that if we can rely on admin client retries, then we should do that.