apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.06k stars 1.83k forks source link

[Feature][Connectors] Abnormal Data Logging #8005

Open Ivan-gfan opened 1 week ago

Ivan-gfan commented 1 week ago

Search before asking

Description

Description:

Currently, there are no metrics for tracking abnormal data records, nor is there an option to ignore exceptions and continue execution. Regardless of whether JDBC or other data sources are used, any error encountered during insertion will terminate the application, which is not user-friendly.

Suggested Improvements:

1. Abnormal Data Metrics:

The final metrics should include not only the read and write counts but also the count of abnormal data. The sum of abnormal data and successful write counts should equal the total read count.

2. Detailed Abnormal Record Entity:

Introduce a domain entity to record detailed information about abnormal records. This entity should include:

3. Batch Submission Handling:

Some connectors may use batch submission to improve performance, relying on the transaction management of the target data source (e.g., the batch_size parameter in the JDBC connector). Users must balance their tolerance for record-level granularity.

4. Planned Total Record Count in Metrics:

It would be beneficial to include the total planned record count in the metrics (e.g., the result of SELECT COUNT(*) FROM source).

Usage Scenario

1. Precise Error Row Counting and Detailed Error Information

2. Incremental Synchronization

3. User Display

4. Key Pain Points:

This approach would improve user experience and ensure data integrity while allowing users to handle errors post-synchronization.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

Ivan-gfan commented 1 week ago

@liugddx PTAL

liugddx commented 1 week ago

@liugddx PTAL

Thanks for following this issue! @Ivan-gfan LGTM! cc: @Hisoka-X @hailin0