Open fhan688 opened 1 month ago
previous PR was reverted https://github.com/apache/hudi/pull/12136, I reopen it and maybe more discussion is needed. @danny0405
We should clarify these items:
write.ignore.failed
option to a common write config for each engine? Previously each eagine has it's own options and behavior.We should clarify these items:
- should we promote the
write.ignore.failed
option to a common write config for each engine? Previously each eagine has it's own options and behavior.- should we throw the exception in write handles or in the driver(after the write status are collected);
- should this option by default false or true?
write.ignore.failed
is a config in FlinkOptions and we promote it to HoodieWriteConfig in hudi-client-common module and named 'hoodie.write.ignore.failed' in this PR.@fhan688 Let's fire a JIRA issue around this and move the discussion there.
Sorry, I meant the GH issue, which is more easier to communicate.
Sorry, I meant the GH issue, which is more easier to communicate.
Change Logs
In Flink engine, if exception occurs when task writing data, it will be ignored and the exception will be reported to StreamWriteCoordinator with write event, StreamWriteCoordinator will decide whether to commit when there is write failure according to 'write.ignore.failed'.
This PR apply 'write.ignore.failed' ahead when write failure occurs, thus throw an exception faster.
for example: CP interval of Flink job is 15 minutes, the exception will not be found until CP commit, it will make a longer data latency in real-time sensitive scenarios.
Impact
module: hudi-client、 hudi-flink-datasource
Risk level (write none, low medium or high below)
low
Documentation Update
None
Contributor's checklist