Open andygrove opened 1 year ago
I would have thought this logic would exist within the query engine, i.e. DataFusion, not the compute engine? In particular I would have thought it would be a TableProvider
detail, that would generate plans with the relevant schema coercion logic?
I'm fine with implementing this in DataFusion. It currently delegates to Schema::try_merge
in this repo, though, so it would likely mean duplicating some of this code in DF. I'll transfer this issue.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am trying to work with the nyctaxi parquet data set which has one file per month. Over time, some of the types have changed. For example
passenger_count
started out asInt64
and was later changed toFloat64
.Arrow-rs can not merge these schemas.
Other solutions (such as DuckDB) will merge these schemas and pick the least restrictive type (
Float64
).Describe the solution you'd like
Describe alternatives you've considered
Additional context