Open rcaudy opened 3 years ago
Likely the first step is some kind of "flattening", but this is contrary to the intent of the Dremel design, so maybe we can think of a better solution.
I'll be improving our error messages with a PR shortly. New messages: For:
t = io.deephaven.db.tables.utils.ParquetTools.readTable("/data/parquetFiles/nonnullable_nested_v1_IMPALA_NULLS_NONE.parquet")
We'll see:
java.lang.UnsupportedOperationException: Unsupported maximum repetition level 2 in column int_array_array/list/element/list/element
For:
t = io.deephaven.db.tables.utils.ParquetTools.readTable("/data/parquetFiles/repeated_nested_RUST_NONE.parquet")
We'll see:
java.lang.UnsupportedOperationException: Encountered unsupported multi-column field phoneNumbers: found columns phoneNumbers/phone/number and phoneNumbers/phone/kind
It might be nice to be able to specify which columns you care about for your Table - in which case, the user can choose to not include the nested columns.
There's a mechanism right now to provide column instructions:
from deephaven.parquet import read, ColumnInstruction
t = read(
path="/snappy.parquet",
col_instructions=[
ColumnInstruction(column_name="date", parquet_column_name="date")
],
)
but this currently throws the error:
java.lang.UnsupportedOperationException: Encountered unsupported multi-column field outputs: found columns outputs/list/element/address and outputs/list/element/index
at io.deephaven.parquet.table.ParquetSchemaReader.lambda$readParquetSchema$1(ParquetSchemaReader.java:174)
at java.base/java.util.HashMap.compute(HashMap.java:1316)
at io.deephaven.parquet.table.ParquetSchemaReader.readParquetSchema(ParquetSchemaReader.java:169)
at io.deephaven.parquet.table.ParquetTools.convertSchema(ParquetTools.java:647)
at io.deephaven.parquet.table.ParquetTools.readTableInternal(ParquetTools.java:384)
at io.deephaven.parquet.table.ParquetTools.readTable(ParquetTools.java:94)
A user has hit this w/ the parquet viewer, see https://github.com/devinrsmith/deephaven-parquet-viewer/issues/9
Currently, we regard nested repetition and multi-column fields as uncommon and hard to map into a columnar data table like Deephaven's. This feature request is intended to capture views to the contrary.
Linked to #294 , although intended for a later effort.