Open s-akhtar-baig opened 1 month ago
@Fokko, these changes make current-snapshot-id optional and uses a flag to maintain backwards compatibility. Let me know what you think.
Please note that, with the flag on we can create and load a table with -1 as the current-snapshot-id. This is different from pyiceberg which I believe only supports creating a table.
Also, do we have a preference on where we want to document the "backwards compatibility" section?
Sorry for the late reply as I was touching grass.
We're trying to solve two problems:
-1
since it is erroneous unless there is a snapshot with the ID -1
. You can still create a table with -1
as the current snapshot ID if you set the flag. If you use Java <1.4.0
within your organization, this might be required to be able to read the tables.cc @Xuanwo @Fokko @sdd @marvinlanhenke What do you think?
One problem with doing it through an env var is that it applies to every table you hit in your service. I think it would be better if it was a config param so that you can configure it per table.
@liurenjie1024 @sdd, sure, I can work on https://github.com/apache/iceberg-rust/issues/375 and use a config file to set the value. Let me know if you have a different approach in mind, thanks!
@liurenjie1024 @sdd, sure, I can work on #375 and use a config file to set the value. Let me know if you have a different approach in mind, thanks!
@s-akhtar-baig I think the first step maybe to add a config for the serializer/deseriazer of table metadata to control this behavior. Config file is just one approach to init the config, we should decouple these two things.
@liurenjie1024, I have pushed some changes to have config values for TableMetadata in one place. Please let me know if the direction is right and/or if you had a different idea in mind. I will handle reviews on the tests once I have your feedback on the config changes.
For now, I am using the environment to load these values but future work involves loading from a config file on top of that.
@liurenjie1024, I have pushed some changes to have config values for TableMetadata in one place. Please let me know if the direction is right and/or if you had a different idea in mind. I will handle reviews on the tests once I have your feedback on the config changes.
For now, I am using the environment to load these values but future work involves loading from a config file on top of that.
Hi, @s-akhtar-baig Thanks for the contribution. I have little concern about current approach because it seems not extensible to me. How about this:
pub struct TableMetadataParser {
use_legacy_id: bool,
}
impl TableMetadataParser {
pub async fn write(&self, output_file: OutputFile, table_metadata: &TableMetadata) {
....
}
pub async fn read(&self, input_file: InputFile) -> Result<TableMetadata> {
....
}
}
Then instead of directly ser/de table metadata, we will control behavior using TableMetadataParser
, what do you think?
@liurenjie1024, I see. It makes sense to me and I will make changes accordingly. Thank you for adding your feedback and providing sample code!
Resolves https://github.com/apache/iceberg-rust/issues/352
Problem: Previous versions of Java (
<1.4.0
) implementations incorrectly assume the optional attributecurrent-snapshot-id
to be a required attribute in TableMetadata.Solution: Use
legacy-current-snapshot-id
environment variable to force iceberg-rust to create and load a table with a metadata file compatible with the older versions.Testing: Added new unit tests.
Future work:
legacy-current-snapshot-id
as a configuration value.