Closed progval closed 3 weeks ago
I plan to take a look at this later today :+1:
One thing I'm concerning is we may also need to provide a method that can convert ORC type to Arrow type, the so-called "default" option (or there is one and I miss it?). Otherwise the external user of array_decoder_factory needs to maintain that mapping by themselves.
Yes, I'm planning to add it after #93 is merged. Probably .with_default_time_unit(TimeUnit)
or .with_default_timestamp_type::<T: arrow::datatypes::ArrowTimestampType>()
on the reader builder
Do you think it's possible to store the
Field
insideColumn
itself, considering it's already storingDataType
?Might cut down on having to pass field everywhere 🤔
It looked like a good idea, but it actually complexifies everything:
array_decoder/mod.rs
while destructuring the ORC type, but is now spread between that module, column.rs
, and array_decoder/{struct,list,map,union}.rs
. In particular, the union decoder expects UnionFields
which is an iterable of (i8, Field)
unlike Fields
that every other one would use, so it can't rely on Column::children()
to match both typesColumn::children()
now needs to return a Result
Field
when constructing the Column
, so we need to pass Fields
through Stripe::new
, so also in StripeFactory
array_decoder_factory
to Column::new
to make sure the Column
is self-consistent)
This is the prerequisite to support configurable timestamp precision mentioned in https://github.com/datafusion-contrib/datafusion-orc/issues/75#issuecomment-2130754391