Closed wgtmac closed 1 year ago
cc @shangxinli @gszadovszky @ggershinsky @pitrou @emkornfield
cc @wjones127
I think should we check that no more padding is added in all impl? At least, seems C++, Rust, parquet-mr didn't padding at the end of data.
Seems OK to me.
Propose to explicitly state that no padding is allowed within a data page. This makes it easier for BYTE_STREAM_SPLIT decoder to decode page with nulls. In this way, it can simply get the number of encoded values by
total_length_encoded_stream / K (4 for float and 8 for double)
. Otherwise, it has to decode def/rep levels to get exact number of non-null values.