Closed ggershinsky closed 1 month ago
cc @mapleFU @pitrou
Just curious:
- If multiple files being merged or something, would this being merged with same id, or should this being rewritten?
Each encrypted parquet file has a unique file id , used for signing every module of the file (to ensure they are not swapped, etc). Also, each file typically has a unique encryption key. Therefore, a merged file needs a new id, new row group ordinals, a new key; and re-encryption of each module with the new key / AAD.
- Is this only required when aad suffix?
Row group ordinal is a part of the AAD suffix in most modules
Encrypted files use three types of ordinals: row group, column, page. All three are simple local counters in both writers and readers. In addition, the row group ordinal is stored in the parquet footer (RowGroup.ordinal field). Parquet implementors can benefit from a clarification on the reason for and intended use of this field.