apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.46k stars 966 forks source link

[flink][cdc] Add support for retry cnt instead of busy wait and additional support to skip corrupt records in cdc writer #4295

Open AshishKhatkar opened 1 month ago

AshishKhatkar commented 1 month ago

Purpose

This change adds support for retry count and skipping corrupt records instead of assuming a schema change has happened and waiting indefinitely on schema change. This change will cause the Flink cdc job to fail loudly due to a corrupt message (alternatively users have an option to use config option to log and skip such messages) instead of waiting indefinitely on schema change which might not have happened. For more details check the linked issue.

Linked issue: close #4239

Tests

API and Format

Documentation

gmdfalk commented 1 month ago

Maybe we can add a few unit tests and some docs for the new options?

umeshdangat commented 2 weeks ago

@JingsongLi could you please help review this? We are currently running this patch internally for Yelp as we cannot afford to block all ingestion based on one "corrupt" message. Corrupt messages entering our stream is rare but inevitable (e.g. older schemas that are not backwards compatible) and it is better for us to log and skip them (or even fail the app) rather than block.