Closed kazdy closed 1 month ago
@kazdy sounds good. feel free to take this up and send a pr
I'll wait until #72 gets merged. I did the first strawman impl and it requires some refactoring in the Table itself.
@xushiyan I also have some questions about this, maybe you can give me your opinion on these:
thanks
hey @kazdy
1) we keep name Table
within hudi-core
to avoid redundant prefix; everything in hudi-core
is about Hudi. When import to other crates, we can give it an alias like HudiTable
. We can also add an alias in hudi
crate for external facing API when needed. As of now, no strong need for this.
2) Timeline is responsible for data stored in timeline files under .hoodie/
, and FileSystemView is responsible for the data stored under the table excluding .hoodie/
. It's good to keep things less coupled, unless there is a need for sharing - it's a stateless client performing IO anyway. Maybe you can make a case about why sharing it?
3) Currently Table holds Timeline and FileSystemView. You want to elaborate on what you meant by coherent API?
To integrate hudi-rs with AWS SDK for Pandas (aws wrangler), we must be able to pass botosession related aws authentication params (mostly AWS* params) directly and not only rely on env variable inference.
I want to propose adding an option to handle this:
Although I want to add this for S3, it should work for other storage backends. I'm happy to contribute and add this.