apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.79k stars 1.74k forks source link

[Improve][Connector-v2] Optimize the count table rows for jdbc-oracle and oracle-cdc #7248

Closed dailai closed 1 month ago

dailai commented 1 month ago

Purpose of this pull request

  1. Add select count(*) rather then analyze table which is used to get the total rows of the table for the jdbc-oracle in chunk split stage and oracle-cdc in full stage. Because sometimes the select count is faster than the analyze table. As shown in the figure below, direct select count takes 5 seconds to execute, but analyze table takes 1m37s. 企业微信截图_0d17d4a9-c09e-467a-af3b-c1c050dd80ea
  2. Add the switch to skip analyze table. In some scenarios, for example, the user has a regular task to call analyze table to refresh the statistics of the relevant table, we can skip analyze table and directly query the total number of records of the corresponding table in the statistics, which is faster. And other case is that the table is updated infrequently.

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Carl-Zhou-CN commented 1 month ago

good job

Hisoka-X commented 1 month ago

cc @hailin0