Fixed writing column statistics when there is only one row that is null
Changed
Removed
Deprecated
Security
Description
This issue was reported on our Discord channel by user guiguiolol.
Code to reproduce this issue:
<?php
use Flow\Parquet\ParquetFile\Schema;
use Flow\Parquet\ParquetFile\Schema\FlatColumn;
use Flow\Parquet\Writer;
require_once __DIR__ . '/../../../vendor/autoload.php';
$writer = new Writer();
$schema = Schema::with(
FlatColumn::int32('id'),
);
if (file_exists(__DIR__ . '/test.parquet')) {
unlink(__DIR__ . '/test.parquet');
}
$writer->write(
__DIR__ . '/test.parquet',
$schema,
[
[
'id' => null,
]
]
);
Parquet files created like this can't be read by other software (confirmed through duckdb).
When min/max buffers are empty we can safely make them null (that's what pyarrow is doing).
Below python code to reproduce the same scenario:
import pyarrow as pa
import pyarrow.parquet as pq
import os
# Define the directory and file path
dir_path = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(dir_path, 'test.parquet')
# Define the schema with an int32 'id' column that is nullable
schema = pa.schema([
pa.field('id', pa.int32(), nullable=True)
])
# Prepare the data with 'id' set to None (null)
data = {
'id': [None]
}
# Create a PyArrow Table with the data and schema
table = pa.Table.from_pydict(data, schema=schema)
# Remove the Parquet file if it already exists
if os.path.exists(file_path):
os.remove(file_path)
# Write the table to a Parquet file
pq.write_table(table, file_path)
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description
This issue was reported on our Discord channel by user
guiguiolol
. Code to reproduce this issue:Parquet files created like this can't be read by other software (confirmed through duckdb).
When min/max buffers are empty we can safely make them null (that's what pyarrow is doing). Below python code to reproduce the same scenario: