[X] I searched in the issues and found nothing similar.
Motivation
There are certain cases in which the changelog file and the data file of a bucket shares the same content. For example, it happens when a paimon job is used to synchronize data from a full snapshot of a database into paimon (which means no two records have the same primary key and no merge would be performed), and the job uses input as changelog producer. In such cases, instead of writing duplicated content twice, we can generate the changelog files by copying the data file and vice versa. This optimization can help reduce the IO overhead spent on Paimon sinks and improve the throughput of related jobs.
Search before asking
Motivation
There are certain cases in which the changelog file and the data file of a bucket shares the same content. For example, it happens when a paimon job is used to synchronize data from a full snapshot of a database into paimon (which means no two records have the same primary key and no merge would be performed), and the job uses input as changelog producer. In such cases, instead of writing duplicated content twice, we can generate the changelog files by copying the data file and vice versa. This optimization can help reduce the IO overhead spent on Paimon sinks and improve the throughput of related jobs.
Solution
No response
Anything else?
No response
Are you willing to submit a PR?