[Feature] Generate changelog file by copying data file when they are equal

Search before asking

[X] I searched in the issues and found nothing similar.

Motivation

There are certain cases in which the changelog file and the data file of a bucket shares the same content. For example, it happens when a paimon job is used to synchronize data from a full snapshot of a database into paimon (which means no two records have the same primary key and no merge would be performed), and the job uses input as changelog producer. In such cases, instead of writing duplicated content twice, we can generate the changelog files by copying the data file and vice versa. This optimization can help reduce the IO overhead spent on Paimon sinks and improve the throughput of related jobs.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

[X] I'm willing to submit a PR!

apache / paimon