At present, if the number of files is very large and the commit interval is relatively small, and multiple jobs are written simultaneously, there will be serious competition, and even retry failures (more than ten times will fail).
This is because the data files conflicts checking may be triggered at present, which requires a relatively long time to read data files from old snapshot. If other jobs commit at this time, and repeated commit may still fail because of repeated conflicts checking.
This is very wasteful. We can actually reuse the base files, we can just read incremental files and merge it to last time base files.
Purpose
At present, if the number of files is very large and the commit interval is relatively small, and multiple jobs are written simultaneously, there will be serious competition, and even retry failures (more than ten times will fail).
This is because the data files conflicts checking may be triggered at present, which requires a relatively long time to read data files from old snapshot. If other jobs commit at this time, and repeated commit may still fail because of repeated conflicts checking.
This is very wasteful. We can actually reuse the base files, we can just read incremental files and merge it to last time base files.
Tests
API and Format
Documentation