apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
673 stars 159 forks source link

discuss: re-export arrow types #636

Open xxchan opened 2 months ago

xxchan commented 2 months ago

Hi, I propose to re-export arrow types in iceberg. Want to hear your opinions @liurenjie1024 @Xuanwo

Rationale

When a crate uses a dependency's type in the public API, it would be better to re-export the dependency.

The largest benefit is ease of handling multiple versions:

[dependencies]
# arrow used in the project
arrow = "50"

# which depends on arrow 52
iceberg = "0.3" 

In this case, if we want to pass arrow_array::RecordBatch to iceberg, we have to do something like this:

[dependencies]
# arrow used in the project
arrow-array = "50"

# which depends on arrow 52
iceberg = "0.3" 
arrow-array-for-iceberg = { package="arrow", version="52"}

and then pass arrow_array_for_arrow_udf::RecordBatch to iceberg. If iceberg re-export it like iceberg::arrow::RecordBatch, then we don't need to do so.

Note: This assumes that multiple versions are to some extent unavoidable. It won't help if we want to pass arrow 50 data to arrow 52 data. But it's more like some part of the code only want to pass data to iceberg without affecting other parts of the code that uses arrow 50.

Finally, I don't think there are any drawbacks to re-export arrow in iceberg.

More discussion about this problem:

xxchan commented 2 months ago

Example about this practice: https://docs.rs/deltalake/0.20.0/deltalake/arrow/index.html

https://github.com/delta-io/delta-rs/blob/b3b23be91b8eb8207473785d4b0c20c5d2c174b2/crates/core/src/lib.rs#L101-L102

liurenjie1024 commented 1 month ago

+1 for this, which helps to decipher in compatible types with same name. cc @Xuanwo What do you think?

Xuanwo commented 1 month ago

-0 on this.

I'm not in favor of re-exporting other crates based on my own experience. However, as @xxchan mentioned, there aren't many drawbacks, so I'm okay if someone wants to add it.

liurenjie1024 commented 1 month ago

One case I think this helps is that when another project uses both arrow and iceberg, but with different versions, just as the case in the issue. When you pass them as arguments, you may see quite confusing compiler error like arrow::RecordBatch is not arrow::RecordBatch, since rust treat types from diffrent crate versions as different types. A more detailed example could be found here. Re-exporting makes things easier to understand.