Closed ForeverAngry closed 1 month ago
Hey, the crate supports inserting but no updates because it can't handle deletes. The best way to insert is with datafusion.
Check out this test: https://github.com/JanKaul/iceberg-rust/blob/7b65b34504e710b62bd33d7d46c17be97929c08e/datafusion_iceberg/src/table.rs#L650
Do you want to use it with datafusion or as a rust library?
Hi! Well I'll have to do some reading on datafusion. I'm not familiar with it. But I was hoping to use it in rust, with a polars code base I have.
For now, you can have a look at the following method:
I will try to simplify the writer design for non-datafusion use cases.
That would be awesome! Also I'd love to contribute - I've used the Java iceberg writer a bit, but if you had some diagram or pseudo code of the steps to complete a successful transaction (update, merge..erc) id be happy to help!
Also, do you have an example of how I could test the writer with aws glue?
I haven't implemented the aws glue catalog, so you might need to implement it yourself.
If you have the catalog writing looks something like this:
use iceberg_rust::arrow::write_parquet_partitioned;
// Get table from catalog
let tabular = catalog.load_table(Identifier::parse("my_catalog.my_schema.my_table")?);
// Make sure its a table and not a view
let table = if let Tabular::Table(table) = &tabular {
Ok(table)
} else {
Err(Error::InvalidFormat("database entity is not a table".to_string()))
}?;
// Write arrow batches to object_store
let metadata_files =
write_parquet_partitioned(table, arrow_batches, None)
.await?;
// Create table transaction
table
.new_transaction(None)
.append(metadata_files)
.commit()
.await?;
If you want to read iceberg tables with polars, this crate is not the best option for you. It's not able to do partition pruning with polars and always has to do a full table scan. The apache repo is working on an expression system that will make this possible.
However, if you use this crate with datafusion it performs partition pruning.
I'm open to using datafusion to read the data, the real need I have is just to be able to write partitioned iceberg files using a glue catalog.
If I have time I'll look into the glue catalog. But it could be a while.
As the REST catalog is becoming the standard catalog implementation, I'm not planning to add HMS support.
It looks like this project does have the capability to insert and update records in an existing iceberg table. Am i correct this? Look forward to hearing!