[X] I have searched in the issues and found no similar issues.
Describe the feature
A new Spark SQL command to merge small files
compact table table_name [INTO ${targetFileSize} ${targetFileSizeUnit} ] [ cleanup | retain | list ]
-- targetFileSizeUnit can be 'b','k','m','g','t','p'
-- cleanup means cleaning compact staging folders, which contains original small files, default behavior
-- retain means retaining compact staging folders, for testing, and we can recover with the staging data
-- list means this command only get the merging result, and don't run actually
recover compact table table_name
-- recover a table if compact table command fails
Motivation
There are many cases in which a SQL generate small files, we MUST merge them into bigger ones.
Describe the solution
This command doesn't read-write all of the records of a table, it just merges files in a binary level. Take a CSV table for example, it only appends the byte array from one file to another one, without reading & writing records
Code of Conduct
Search before asking
Describe the feature
A new Spark SQL command to merge small files
Motivation
There are many cases in which a SQL generate small files, we MUST merge them into bigger ones.
Describe the solution
This command doesn't read-write all of the records of a table, it just merges files in a binary level. Take a CSV table for example, it only appends the byte array from one file to another one, without reading & writing records
Additional context
referring to a blog
Are you willing to submit PR?