apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
469 stars 95 forks source link

Support more complex types when reading into arrow record batch. #405

Open liurenjie1024 opened 2 weeks ago

liurenjie1024 commented 2 weeks ago

This is a follow up issue of #244, where we have some limitations of reading into arrow record batch:

  1. Only primitive type is supported. That's the projected fields count not be either nested filed of structs, or non primitive types such as map, list.
  2. Type promotion is missing.
  3. Default value is not supported yet.

We should implement sth like ArrowProjectionVisitor to support these.

sdd commented 2 weeks ago

I'm happy to start looking into this one if you like?

liurenjie1024 commented 2 weeks ago

I'm happy to start looking into this one if you like?

Thanks @sdd !

sdd commented 6 days ago

Hi @liurenjie1024. I have a query regarding flattening. You gave this example in Issue #244:

... For example, when we select (person.address.street, person.name], we have projection mask [1,2], and the returned schema is

schema {
  struct person {
     struct address {
        string street
     }
     string name
  }
}

But what we expect is

schema {
   string person
   string name
}

Originally posted by @liurenjie1024 in https://github.com/apache/iceberg-rust/issues/244#issuecomment-2021856565

Why do we expect that? I would have thought it would turn out like this:

schema {
    string street
    string name
}
liurenjie1024 commented 6 days ago

@sdd Yes, it was a typo, it should be

schema {
    string street
    string name
}