apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
469 stars 95 forks source link

bug: Iceberg opendal missing s3 support #385

Closed c-thiel closed 2 weeks ago

c-thiel commented 1 month ago

After the opendal update to opendal 0.4.6 in https://github.com/apache/iceberg-rust/commit/3b8121eaa9e9628536093836dcc41119716afd9e#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542, the iceberg crate no longer supports S3 on its own.

I assume this is because in opendal 0.4.6 the "default" features changed to https://github.com/apache/opendal/blob/main/core/Cargo.toml#L42 default = ["reqwest/rustls-tls", "executors-tokio", "services-memory"]

From https://github.com/apache/opendal/blob/v0.45.0/core/Cargo.toml:

default = [
  "rustls",
  "services-azblob",
  "services-azdls",
  "services-cos",
  "services-fs",
  "services-gcs",
  "services-ghac",
  "services-http",
  "services-ipmfs",
  "services-memory",
  "services-obs",
  "services-oss",
  "services-s3",
  "services-webdav",
  "services-webhdfs",
  "services-azfile",
]

However, the iceberg crate holds FileIO, so we should expose S3 functionality at least via a feature. I see that i.e. the catalog crates all specify some opendal feature flag.

Maybe @Xuanwo can have a look since you know both sides?

Original Error upon calling .new_output('s3://some/path') on FileIO: Source: Unsupported (permanent) at , context: { scheme: s3 } => scheme is not enabled or supported

Xuanwo commented 1 month ago

I believe we should enable the storage we want in iceberg crate.

liurenjie1024 commented 1 month ago

I believe we should enable the storage we want in iceberg crate.

Will this cause the library to include unnecessary dependencies? I think using features to control what's included would be better? And we can provide a all-io feature flag for convenience?

cc @Xuanwo @Fokko

liurenjie1024 commented 1 month ago

BTW how the integration tests passed when s3 is not included? I think at least this test can't pass?

c-thiel commented 1 month ago

BTW how the integration tests passed when s3 is not included? I think at least this test can't pass?

Hm, good point. We are testing via cargo test --no-fail-fast --all-targets --all-features --workspace

Could it sneak its way in somehow via dev-dependencies of catalogs or datafusion?: https://github.com/apache/iceberg-rust/blob/4407634d4e7760afabd8789f585cfee0e6b19e8f/crates/catalog/rest/Cargo.toml#L49

liurenjie1024 commented 1 month ago

Good catch! Thanks @c-thiel for the finding!

Xuanwo commented 1 month ago

Maybe we can add features like storage-s3, storage-gcs at the iceberg side.

liurenjie1024 commented 1 month ago

Maybe we can add features like storage-s3, storage-gcs at the iceberg side.

+1

Xuanwo commented 1 month ago

Let me fix this.