confluentinc / kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Other
1.01k stars 953 forks source link

Connector Fails After a Network Issue and Doesn't Reconnect for Ten Minutes #1372

Open CliffWheadon opened 7 months ago

CliffWheadon commented 7 months ago

We're experiencing an issue where this connector dies after network outages between the data center it is deployed in and the one that the database lives in. Anytime there is a network outage, which may last anywhere from 30 seconds to a minute, the connector will fail. The connector is able to reconnect, but it takes about ten minutes for it to do so and start picking up data again.

We tried enabling DEBUG level logging for org.apache.kafka.connect.runtime.WorkerSourceTask and io.confluent.connect.jdbc, but we don't see anything in the logs that explains why the connector ceases to work for ten minutes.

We have connection.attempts set to 6, poll.interval.ms set to 500, and tasks.max set to two. We're monitoring two tables.

We're running SQL server: Microsoft SQL Server 2012 - 11.0.5343.0 (X64), Kafka connect: cp-kafka-connect:7.2.4 and JDBC plugin: kafka-connect-jdbc:10.7.4.

Is the behavior of taking too long to reconnect and consume data expected? Is it part of a known issue?

This issue is affecting us in production, so any suggestions you have to further investigate or fix things are greatly appreciated.

parvezDevIT commented 5 months ago

If data is pulled by connector, then that will imply connector was working fine though delayed. Since problem seems be introduced by Network connectivity outages, check your source MS SQL data base if it is taking longer time to run your queries after the outages. This is possible if network outages effects other systems pulling data / using source data base and because of outage, there is pending queue of job and it will be submitted the moment connectivity is restored, if that's the case your source DB server is momentarily under heavy load hence delayed query output. Also check your kafka configuration: config.storage.topic - Topic to store the connector and task configuration state in. This topic should always have a single partition and be highly replicated (3x or more). offset.storage.topic - Topic to store the connector offset state in. This topic should have a large number of partitions (for example, 25 or 50 partitions and highly replicated (3x or more).