Open Dandandan opened 2 years ago
Closing, seeing this could be done with the schema on table provider instead.
@Dandandan in #965 I used the schema from the ExecutionPlan
trait and it worked fine. But I do agree that it might be better to come up with at data structure that helps asserting that the column_statistics
vector is well aligned on the schema fields
vector (same size, same types...). I'm adding this as an item in #997, so if you want to close this for now that's fine by me 😃
Is your feature request related to a problem or challenge? Please describe what you are trying to do. While looking at adding support for more statistics on the Delta Lake
TableProvider
implementation I bumped into some limitation in our statistics API.Currently columnstatistics is a
Option<Vec<ColumnStatistics>>
.https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/datasource/datasource.rs#L37
So, it should return the statistics by (correct) index regardless of the order in the files.
Describe the solution you'd like Either:
HashMap<String, ColumnStatistics>
rather than aOption<Vec<ColumnStatistics>>
Schema
parameter toTableProvider::statisitics
so the positions of the fields can be calculated.FWIW, Delta Lake / delta-rs takes the first approach and seems straightforward to implement and use.
Describe alternatives you've considered
Additional context