Closed xleoken closed 2 years ago
How is this work going? I am interested in it, and I am willing to subit a PR.
How is this work going? I am interested in it, and I am willing to subit a PR.
Welcome @zhaomin1423.
The dirty data management has two aspect. First, We can handle data one by one, then, the database must support transactions because when writing a batch data with few dirty data, the database must rollback. Therefore, we can write the batch one by one to catch the dirty data. In spark, add a datasource strategy to transform WriteToDataSourceV2 to an extended WriteToDataSourceV2Exec. So, we can handle the data one by one to mange dirty data. Then, to implement a jdbc connector base on DataSourceV2 API.
Welcome to comment.
Search before asking
Description
We may meet some dirty records when transmitting data, so we may need a dirty data management mechanism to handle them. This issue is under discussing, for free to share your options.
Are you willing to submit a PR?
Code of Conduct