datafusion-contrib / datafusion-orc

Implementation of Apache ORC file format use Apache Arrow in-memory format
Apache License 2.0
28 stars 8 forks source link

Add ArrowReaderBuilder::schema() #94

Closed progval closed 1 week ago

progval commented 1 week ago

This allows getting the schema before building a reader; so users can apply transformations to the schema, like this:

let reader_builder = reader_builder
    .with_projection(projection.clone())
    .with_batch_size(ORC_BATCH_SIZE);

let schema = transform_schema(&reader_builder.schema());

let reader = reader_builder.with_schema(schema).build();

where transform_schema could, for example, be a function that changes the TimeUnit of Timestamp datatypes.