In certain circumstances, the CLI will fail to read old (perhaps ancient) parquet files that have an incorrect compressed_size field set in the column metadata that does not include the dictionary page (at least according to the comment in the code). The code that is supposed to handle this does not flip the byte buffer it reads the extra bytes into. It appears to have been broken for a few years now.
I have written a PR that includes a defective parquet file with this issue, wrote a unit test that fails without the additional flip, and validated that the code works afterwards.
This is a trivial minor issue that was from learning the code rather than actually addressing a production issue, so there's no urgency.
In certain circumstances, the CLI will fail to read old (perhaps ancient) parquet files that have an incorrect compressed_size field set in the column metadata that does not include the dictionary page (at least according to the comment in the code). The code that is supposed to handle this does not flip the byte buffer it reads the extra bytes into. It appears to have been broken for a few years now.
I have written a PR that includes a defective parquet file with this issue, wrote a unit test that fails without the additional
flip
, and validated that the code works afterwards.This is a trivial minor issue that was from learning the code rather than actually addressing a production issue, so there's no urgency.