[paimon-flink-cdc] Add the latest_schema state at schema evolution operator ，Reduce the latest schema access frequency

Purpose

In scenarios where the number of Paimon table fields is large and the Write concurrency is high, reduce the Latest-Schema access frequency to improve the throughput of job cold start

Tests

Case-1: Observe whether the checkpoint time of schema evolution changes Conclusion: After optimization, Schema Evolution is basically completed in seconds, or even milliseconds.

Case-2: Observe the log to see if there are still a large number of read schema behaviors Conclusion: From hundreds of thousands to 115 times

API and Format

org.apache.paimon.flink.sink.cdc.UpdatedDataFieldsProcessFunction#processElement

Documentation

Before the Schema Evolution operator calls org.apache.paimon.flink.sink.cdc.UpdatedDataFieldsProcessFunctionBase#extractSchemaChanges, add a judgment to confirm whether the field update really needs to be triggered.

Add a List variable to determine whether it is an updated column: List latestSchemaList
Add a state ListState. When the task is restored from the state, it is directly restored from here: ListState latestSchemaListState

apache / paimon