Open wowo opened 3 years ago
Added stack trace from logs ☝️
Also previously I was getting error like this: Failed to get offsets by times in 30000ms
Update: I've run opt/bitnami/kafka/bin/kafka-reassign-partitions.sh --zookeeper 10.164.0.32:2181 --reassignment-json-file increase-replication-factor.json --execute
and increased replication factor of the command topic to 3. First 3 streams created without problem, now getting it again...
@wowo, there are a number of environmental factors that could be causing this symptom. The caused by error message, Timeout expired after 60000 milliseconds while awaiting InitProducerId
, is a signal to look for exceptions in the logs for your Kafka brokers and to check whether your Kafka cluster has any partitions that are under the minimum ISR.
Was the number of brokers in the Kafka cluster increased at some point?
@colinhicks Thanks for answer and sorry for late reply. No, it always has been 3 brokers in the cluster. I've recentyl added another 2 ips into KSQL configuration, as a remedy fo the problem but it didn't help (and I think it doesn't really matter)
Another bug which happens now "There is a newer producer with the same transactionalId which fences the current one." related to this...
Also drop stream <any of them>
works immediatelly...
Another problems when I run describe <stream> extended
:
[2021-08-27 11:04:05,458] ERROR Failed to list Kafka consumer groups offsets
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting
for a node assignment. Call: listConsumerGroupOffsets
Caused by: Timed out waiting for a node assignment. Call:
listConsumerGroupOffsets (io.confluent.ksql.cli.console.Console:344)
I'm suspecting some problems with Zookeeper, but restart didn't help...
The drop stream
command does not attempt to Kafka, whereas the other commands do. These symptoms point toward the ksqlDB cluster not being able to connect to the brokers.
One suggestion is to more directly test the network route-ability between the ksqlDB hosts and the brokers.
For example, running telnet from the ksql host(s) to the brokers should succeed:
ksql-host1:~$ telnet <broker1-dns-name-or-ip> 9092
Trying <ip address>...
Connected to <broker1-dns-name-or-ip>.
Escape character is '^]'.
@colinhicks I did it, telnet works just fine
ping 10.164.0.30
64 bytes from 10.164.0.30: seq=811 ttl=63 time=0.354 ms
64 bytes from 10.164.0.30: seq=812 ttl=63 time=0.277 ms
^C
--- 10.164.0.30 ping statistics ---
813 packets transmitted, 813 packets received, 0% packet loss
round-trip min/avg/max = 0.157/0.275/1.549 ms
The networking layer looks okay to me, but I'm open to investigate more.
Btw. I've created SO thread as well https://stackoverflow.com/questions/68953691/kafka-ksql-cant-create-streams-due-to-timeouts
Also I'm getting following outputs when querying rest API:
$ curl http://localhost:8088/info
{"KsqlServerInfo":{"version":"0.19.0","kafkaClusterId":"jjJyaiWBQPaI3_TrFSQmxw","ksqlServiceId":"prod-ksqldb-server","serverStatus":"ERROR"}}
$ curl http://localhost:8088/healthcheck
{"isHealthy":false,"details":{"metastore":{"isHealthy":true},"kafka":{"isHealthy":true},"commandRunner":{"isHealthy":false}}}
I've found a workaround by using ksql.queries.file and running it in a Headless mode. It works for me as I don't really need interactive version.
To the broker, I set the below replication factor to 1 and it worked with ksqldb in interactive mode.
- transaction.state.log.replication.factor
- transaction.state.log.min.isr
- offsets.topic.replication.factor
Reference - https://github.com/confluentinc/cp-all-in-one/blob/7.2.2-post/cp-all-in-one/docker-compose.yml Look at the broker configuration
Describe the bug Cannot create stream because getting timeout while initializing transaction to the KSQL command topic. I'm trying to create stream, sometimes it succeeds, most of the time fails. I guess stream definition doesn't really matter, but here it is:
To Reproduce The version of KSQL confluentinc/ksqldb-server:0.19.0 (running on Google Kubernetes Engine). Kafka runs on virtual machines
Expected behavior Stream created.
Actual behaviour CLI output:
Exception in logs:
Command topic details:
Additional context I'm running 3 node kafka cluster. All three brokers passed in KSQL_BOOTSTRAP_SERVERS (previously had one value there, now passed all three). All existing streams are working fine.