apache / hudi-rs

A native Rust library for Apache Hudi, with bindings into Python
https://hudi.apache.org/
Apache License 2.0
144 stars 29 forks source link

Setup integration test with minio #81

Open xushiyan opened 3 months ago

abyssnlp commented 3 months ago

I'll take this one if noone's assigned to it yet.

xushiyan commented 3 months ago

@abyssnlp before you start, can you please elaborate on the design?

abyssnlp commented 3 months ago

Sure, so at a high level:

I'll add more details today after work. Please feel free to add things I should keep in mind while I work on this.

xushiyan commented 3 months ago

@abyssnlp high-level looks good. a heads-up about testing data - since hudi-rs not yet supports hudi writer, we are using fixed pre-generated tables as the testing tables: see https://github.com/apache/hudi-rs/tree/main/crates/tests/data/tables Would like to see some detailed design around provisioning test tables through minio volumes

abyssnlp commented 3 months ago

Sorry about the delay. Thanks for pointing to that. So we can mount the existing tables under here into the container before running the tests.

Something like:

.with_mount(Mount::bind_mount(canonicalize(Path::new("tests/data"))?.into_os_string().into_string().unwrap(),

However something that I found out about testcontainers in Rust is that it doesn't support reusing the container for multiple tests so the integration tests would be for ex. in a single test function. More about it here. There are also workarounds. Alternative would be to use docker-compose to spin up minio before running integration tests and spin it down after.

So this is how I'm thinking about approaching it:

I had some questions as well.

xushiyan commented 3 months ago

@abyssnlp sounds good to make use of docker-compose - it'll be convenient to evolve the tests as we probably need to add more components in future. to answer the questions

abyssnlp commented 3 months ago

Thanks for sharing your thoughts on it. Having them separate from the crates sounds good. I've started some initial work on a local branch and managed to get Minio up with the pre-generated tables.

I'm currently running into some issues trying to read the tables via hudi-rs.

I've tried using both environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and local MinIO specific AWS_ENDPOINT and ALLOW_HTTP) and providing the config via the HudiDataSource::new_with_options as a Vec<&str, &str>. I can confirm the object store config works for hudi::storage::Storage for ex. while trying to read the contents of .hoodie/hoodie.properties.

Might be some bad configuration on my end. I'll continue working on it this week and keep posting updates here.

xushiyan commented 2 months ago

@abyssnlp any plan to put this up in a PR?

abyssnlp commented 2 months ago

@xushiyan Yes i'll put it up in a PR soon (today or tomorrow).