Open cpwright opened 1 month ago
As noted by @cpwright , we're still defining this ticket and its priority.
One detail we'll need to be sure to handle is data indexes when there are multiple row groups. One approach might be to mirror the row group structure of the "main" file in each index file, as a hint that we potentially need to shift the row sets persisted to the index table in order to compensate for row group shifts in the main table.
I could also see #6125 as imposing some writing requirements; potentially the need to tack on field_id
s, or add KV metadata, amongst other things (I don't know what support we may or may not already have for those types of reqs).
As a systems integrator, I want to be able to have increased control over writing parquet files so that I can implement a process for transforming data overnight.
This ticket needs more definition before we work on it, but I would like to be able to either pass a row-group of data at once to the write function; or alternatively pass one column of a row-group at one time so that I can ensure read-locality for my input data.