apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.49k stars 1.97k forks source link

[Feature][MySQL CDC] MySQL cdc support start by time #9144

Open davidzollo opened 1 month ago

davidzollo commented 1 month ago

Search before asking

Description

MySQL CDC support start by time.

Currently, the MySQL CDC source supports starting from specific binlog position or GTID. However, in many real-world scenarios, users expect to start a synchronization job based on a human-friendly timestamp, such as:

Adding support for start-time (e.g. 2024-04-10 08:00:00) will greatly simplify CDC task configuration and make SeaTunnel more user-friendly in operational scenarios.


source {
  MySQL-CDC {
    hostname = "xxx"
    port = 3306
    ...
    start-time = "2024-04-10 08:00:00"  # Suggested new feature
  }
}

Error handling:

Case Behavior
start-time too old, binlog already purged Fail fast with clear error:Start time is earlier than binlog available. Earliest = 2024-04-08 11:00:00
start-time too new (after current time) Allowed, CDC will wait until matching binlog is produced
Time parsing failure Job fails with IllegalArgumentException

User Scenario:

In real-world CDC scenarios, users often face recovery requirements like:

“I want to resume this CDC pipeline from 2024-04-10 00:00:00”

“I want to only capture changes after yesterday 08:00”

“Binlog filename is not available, but timestamp is known”

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

FrommyMind commented 1 month ago

what is the difference between cdc base option startup.timestamp

ocean-zhc commented 1 month ago

If no one claims it, please assign it to me.

ocean-zhc commented 1 week ago

9285