apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
675 stars 159 forks source link

feat (datafusion): making IcebergTableProvider public to be used without a catalog #650

Closed a-agmon closed 1 month ago

a-agmon commented 2 months ago

Currently, it's quite complicated to query an existing iceberg table using DataFusion without a catalog. Enabling query using DataFusion with just a Table reference can make the API very flexible and usable because it enables us to query any Iceberg Table we want (with just the file_io and metadata location) regardless of its source or catalog.

The proposed PR tries to achieve the following functionality, which enable users to create a queryable data fusion TableProvider from StaticTable

let warehouse_location = String::from("s3://X");
let file_io = FileIO::from_path(warehouse_location)?.build()?;
let metatdata_location = "s3://X/Y.metadata.json";
let table_indent = TableIdent::from_strs(["myschema", "mytable"])?;
// get a static table
let table = StaticTable::from_metadata_file(metatdata_location, table_indent,file_io).await?;
let table_provider = IcebergTableProvider::try_new_from_table(table.into_table()).await?;
let ctx = SessionContext::new();
ctx.register_table("mytable", Arc::new(table_provider))?;
manuzhang commented 1 month ago

Not familiar with rust, but do we have tests for such changes?

liurenjie1024 commented 1 month ago

Not familiar with rust, but do we have tests for such changes?

Good point, this is mostly an access modifier change, and other parts are already covered by test, so generally LGTM. But it would be better to have some tests to cover the usage.

a-agmon commented 1 month ago

Thanks @manuzhang , I will also add an integration test to cover this.

a-agmon commented 1 month ago

Thanks again @manuzhang. @liurenjie1024 - please see this PR adding UT to this usage - creating a table provider from metadata file

https://github.com/apache/iceberg-rust/pull/651