confluentinc / kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Other
19 stars 955 forks source link

Implementing new config variable poll.sleep.ms #1349

Open akohlbecker opened 1 year ago

akohlbecker commented 1 year ago

Problem

When running the jdbc source connector in bulk mode (mode=bulk) the JdbcSourceTask fetches the whole table in one go SELECT * FROM {table}. poll.interval.ms determines the speed by which the records are sent to the target topic. Once all records a committed to the topic the next SELECT query is sent to the data base system. There is no other option to reduce the frequency of requests being send to the DB than increasing poll.interval.ms which slows down the process of committing the messages to the target topic.

This leads to a paradox situation if you want to poll a table only once day for example.

Solution

With this pull request is suggest adding a poll.sleep.ms setting that allows the JdbcSourceTask to sleep after the whole table result set has been consumed.

This setting can be used to limit the frequency at which the SQL server is being queried without limiting the processing speed of already obtained result sets.

Does this solution apply anywhere else?
If yes, where?

Test Strategy

Test implemented : io.confluent.connect.jdbc.source.JdbcSourceTaskUpdateTest.testBulkPeriodicLoadWithPollSleep()

Integration tests implemented in other branch, which are using the Filemaker dialect for which this fork has been created:

Manual tests done in so far, as this feature is being used in production at us.

Testing done:

Release Plan

cla-assistant[bot] commented 1 year ago

CLA assistant check
All committers have signed the CLA.