Open tustvold opened 7 months ago
cc @liurenjie1024
Hi, @tustvold @alamb Thanks for this proposal and write up, object_store looks great to me!
In iceberg's design, all file ios are hidden under the FileIO
interface, and the backends, i.e. OpenDAL
or object_store
are not directly exposed to user, so I think we can integrate it without any breaking changes.
Currently OpenDAL
works well for us and we are focusing on implementing more features for iceberg-rust
, so it may take a while for us to evaluate object_store
and integrate it into this crate.
First-party integration with arrow-rs, parquet, DataFusion and polars, including sophisticated vectored and streaming IO
I'm quite interested in this since we are about to add support for file reader/writer, which will heavily depend on arrow-rs
, parquet
, etc, so I think object_store
is quite promising.
cc @Xuanwo
Hi @tustvold, thank you for initiating this discussion! I will do my best to offer a multifaceted response with different hat.
As @liurenjie1024 mentioned, iceberg-rust features its own FileIO
interface to abstract IO operations. OpenDAL
and object_store
are merely implementation details with no current plans for external exposure.
It's fine to integrate with object_store
, as that is precisely what we created FileIO
for. However, it's important to note that we are in the initial stages of this project: currently focusing on the first release and implementing read/write capabilities.
Here are some remarks regarding the object_store feature set:
A flexible configuration system developed in partnership with, and used by both the polars and delta-rs communities
iceberg-rust is aligned with Iceberg and PyIceberg, sharing the same configuration logic; therefore, the object_store's configuration system is redundant for our purposes.
Support for conditional writes, which would allow iceberg-rs to support multiple concurrent writers directly against object storage, without needing an external catalog
While the conditional put feature offers certain advantages, it may not be as crucial for our current use cases in iceberg-rust, where integration with a catalog like Hive or REST is more common.
As an iceberg-rust developer, I am eager to unlock more potential within the project.
Firstly, opendal and object_store are not competitors. (And remember, I'm also a contributor to object_store
!) Rather than discussing replacements, I'd prefer to explore how we can coexist to offer our users more choices and possibilities.
I believe opendal
integrates seamlessly with object_store
, which is why our community created object_store_opendal
, enabling users to utilize opendal
as an implementation of object_store
.
Here are a few reasons why OpenDAL is beneficial for iceberg-rust.
read_with()
function.Writer
without needing to understand MultipartUpload.object_store_opendal
integration, enabling seamless connection to existing object_store
-based systems.I also found some places that OpenDAL can improve (Thanks @tustvold!):
As an OpenDAL maintainer, I believe OpenDAL offers features that could be beneficial for iceberg-rust, potentially simplifying some aspects of storage management. And I will be happy to collaborate with object_store
to ensure the success of iceberg-rust.
Thank you both for the responses.
In iceberg's design, all file ios are hidden under the FileIO interface, and the backends, i.e. OpenDAL or object_store are not directly exposed to user, so I think we can integrate it without any breaking changes.
Glad to here efforts are being made to keep the IO primitives abstracted and pluggable 👍. I would just observe that FileIO appears to mirror filesystem APIs, and that this has historically been a pain point in systems that chose this path. For example Spark has had a very hard time getting a performant S3 integration, with proper vectored IO only being added to OSS Spark very recently. By contrast the object_store APIs mirror those of the actual stores, and are designed to work well with the APIs in arrow-rs, avoiding all the complexities of prefetching heuristics and similar.
discussing replacements
I entirely agree, I guess I was more suggesting that the IO abstraction mirror object_store as this is what both the upstream crates use and expect, and what the underlying stores provide. If people then wanted additional backend support they could plug OpenDAL into this interface?
I'm quite interested in this since we are about to add support for file reader/writer
I'd be happy to help out with this, if you're open to contributions, both myself and my employer are very interested in native iceberg support for the Rust ecosystem
Thank you all -- this is a great conversation.
I entirely agree, I guess I was more suggesting that the IO abstraction mirror object_store as this is what both the upstream crates use and expect, and what the underlying stores provide. If people then wanted additional backend support they could plug OpenDAL into this interface?
I took a look at the FileIO interface that @liurenjie1024 and @Xuanwo pointed it. Eventually they seem to provide something that implements AsyncRead
and AsyncWrite
While it is true that AsyncRead
and AsyncWrite
's interfaces (seek, random IO, etc) can be used in such a way that would perform very poorly for remote object storage, I think if users are judicious and provide sufficients hints, and buffer the reads the performance difference will be negligible.
The "benefit" that one might get from using object_store
is that its API is more opinionated and makes it very awkward to use poorly
In my opinon, the use of OpenDAL to connect to more storage systems other than object stores is pretty compelling.
Perhaps as you proceed integrating iceberg-rust with arrow-rs/parquet/datafusion we will learn more about how these various systems can be integrated and if any adjustments need to be made, either in OpenDAL or object_store or downstream in some other crates
Thanks everyone for this very nice discussion.
I'd be happy to help out with this, if you're open to contributions, both myself and my employer are very interested in native iceberg support for the Rust ecosystem
Of course we are open to contributions from everyone, and that's the key spirit of open source project. Please note that this is an apache project, and everyone is welcome to contribute.
As with the FileIO
interface, it's inspired by iceberg's java/python implementation. I have to admit that I don't have much experience working with object store such as s3, and I don't know much about its difference with file systems such as hdfs. I believe the whole iceberg community welcomes ideas and design as long as it's reasonable and provides benefits for performance.
I think if users are judicious and provide sufficients hints, and buffer the reads the performance difference will be negligible.
If primarily performing sequential IO I would tend to agree, the AsyncRead abstraction will be less efficient than a streaming request, but if pre-fetching is configured appropriately the end-to-end latency should be similar. However, it is "random" IO such as occurs when reading structured file formats like parquet, that this difference becomes more stark.
Fortunately the fix is extremely simple, adding InputFile::get_ranges
that can be called by AsyncFileReader. This can then call through to vectorised IO primitives where supported.
Of course we are open to contributions from everyone In iceberg's design, all file ios are hidden under the FileIO interface
Would you be open to a PR to allow using either OpenDAL or object_store, along with corresponding feature flags, or would you prefer to not complicate matters at this time? I think this could be achieved in a fairly unobtrusive manner.
Thanks @tustvold for raising this and please don't hesitate to open an issue or PR.
For example Spark has had a very hard time getting a performant S3 integration, with proper vectored IO only being added to OSS Spark https://github.com/apache/arrow-datafusion/issues/2205#issuecomment-1100069800.
This is why the Iceberg Java implementation ships with its own vectorized parquet reader :)
It looks to me that object_store
and FileIO aim to solve the same problem. Iceberg is designed to work on object stores from the start, and not on filesystems. Similar to object_store the FileIO concept is very opinionated. Since many people are still on HDFS, this is also supported since Filesystems offer stronger guarantees than object stores. If you want to learn more about the FileIO concept, this is a good primer on the concept.
It looks to me that object_store and FileIO aim to solve the same problem
That's awesome, thank you for the link. That is exactly what object_store is, an opinionated abstraction that ensures workloads are not overly reliant on filesystem-specific APIs and behaviour. Really cool that the iceberg community chose to take this approach, I agree with it wholeheartedly :+1:
FWIW I notice that the InputFile contract is not vectorised itself, but I guess if you have a custom parquet reader you could lift the range coalescing into it.
Would you be open to a PR to allow using either OpenDAL or object_store, along with corresponding feature flags, or would you prefer to not complicate matters at this time? I think this could be achieved in a fairly unobtrusive manne
Hi, @tustvold Welcome to open pr for this.
About the timing, my suggestion is to wait for a moment. Currently this crate has finished rest catalog and serialization/deserialization of metadata, basic file based table scan planning. We are expecting to implement two things following: a parquet file writer which writes arrow record batch, and reading parquet file to arrow record batch stream. These two features depends on FileIO
a lot, and would provide solid and concrete use cases for our new io interface, so that we can have better understanding and discussion about the benefits of these changes. What do you think?
I have debated filing this ticket for a while, but largely held off as I wasn't sure how well it would be received, especially as I am acutely aware that this crate currently makes use of OpenDAL and @Xuanwo is an active contributor to both repositories. However, I feel it is important to have these discussions, and part of my role as a maintainer of object_store is to engage with others in the community and hear about how its offering could be made more compelling.
That all being said, I think object_store provides some quite compelling functionality that might be of particular interest to this project:
The major area object_store is limited, somewhat intentionally, is in the number of first-party implementations; only supporting S3-compatible stores, Google Cloud Storage, Azure Blob Storage, in-memory and local filesystems. However, the object-safe design does allow for third-party implementations, for things like HDFS.
I look forward to hearing your thoughts, but also fully understand if this is not a discussion you would like to engage with at this time.