apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
596 stars 134 forks source link

refactor: make `PartitionSpec` more safe #550

Open liurenjie1024 opened 1 month ago

liurenjie1024 commented 1 month ago

This discussion is a follow up of this comment, in summary, I'm thinking about make PartitionSpec safe with following changes:

  1. Making all fields private to this struct.
  2. Add a schema field to PartitionSpec.

This will introduce several changes:

  1. PartitionSpec could only be built using builder, which has checks for serveral parts to ensure it's correct
  2. PartitionSpec's partition type could be infered by itself
  3. We may need to change the deserialization of TableMetadata to use builder to build it.
liurenjie1024 commented 1 month ago

cc @c-thiel @Xuanwo

c-thiel commented 1 week ago

@liurenjie1024 we just need to be very careful with old PartitionSpecs: They might not be valid anymore for the current schema - yet, we still want to keep them. In these cases Java has schema field pointing to the current schema, even if fields might not be present. Java achieves this by using build_unchecked.

In think it would be much cleaner to use UnboundPartitionSpecs for these cases. We could change TableMetadata to

pub struct TableMetadata {
...
    pub(crate) partition_specs: HashMap<i32, UnboundPartitionSpecRef>,  // Changed to unbound
    pub(crate) default_spec: PartitionSpecRef, // This is a new field bound to the current schema.
    // Remove: default_spec_id - get it from default_spec.id
}