apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
675 stars 159 forks source link

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

Closed chenzl25 closed 3 weeks ago

chenzl25 commented 2 months ago

As we know, FileScanTask has two fields project_field_ids and schema. I think the RecordBatch from the reader of this FileScanTask should always follow the schema specified in FileScanTask. However, in some case the schema could be inconsistent.

Considering we have an iceberg table with schema (c1 int, c2 int, c3 int). If we select the table with this order c3, c2, c1. The RecordBatch schema still is c1, c2, c3 which confuses me a lot.

pub struct FileScanTask {
    data_file_path: String,
    project_field_ids: Vec<i32>,
    schema: SchemaRef,
    ...
}
liurenjie1024 commented 2 months ago

I think this could be solve together with other problems like type promotion.

chenzl25 commented 3 weeks ago

I think this issue has been resolved by type promotion