apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
609 stars 136 forks source link

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

Open chenzl25 opened 2 weeks ago

chenzl25 commented 2 weeks ago

As we know, FileScanTask has two fields project_field_ids and schema. I think the RecordBatch from the reader of this FileScanTask should always follow the schema specified in FileScanTask. However, in some case the schema could be inconsistent.

Considering we have an iceberg table with schema (c1 int, c2 int, c3 int). If we select the table with this order c3, c2, c1. The RecordBatch schema still is c1, c2, c3 which confuses me a lot.

pub struct FileScanTask {
    data_file_path: String,
    project_field_ids: Vec<i32>,
    schema: SchemaRef,
    ...
}
liurenjie1024 commented 2 days ago

I think this could be solve together with other problems like type promotion.