apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.46k stars 966 forks source link

[Feature] Avoid logging full rows with sensitive information on conversion failure in Flink CDC #4290

Open atallahade opened 1 month ago

atallahade commented 1 month ago

Search before asking

Motivation

Using a Flink CDC connector, when a row fails to convert according to the given schema, Paimon logs the entire row, which may contain sensitive information. You can see this in CdcRecordUtils.java. For example:

2024-10-08 14:14:48,673 [] INFO  org.apache.paimon.flink.sink.cdc.CdcRecordUtils              [] - Failed to convert value <REDACTED_ROW> to type <REDACTED_SCHEMA>. Waiting for schema update.

Solution

Logging only the primary key could be a solution.

Anything else?

No response

Are you willing to submit a PR?