Open xushiyan opened 4 months ago
@abyssnlp before you start, can you please elaborate on the design?
Sure, so at a high level:
testcontainers
before running the integration testsI'll add more details today after work. Please feel free to add things I should keep in mind while I work on this.
@abyssnlp high-level looks good. a heads-up about testing data - since hudi-rs not yet supports hudi writer, we are using fixed pre-generated tables as the testing tables: see https://github.com/apache/hudi-rs/tree/main/crates/tests/data/tables Would like to see some detailed design around provisioning test tables through minio volumes
Sorry about the delay. Thanks for pointing to that. So we can mount the existing tables under here into the container before running the tests.
Something like:
.with_mount(Mount::bind_mount(canonicalize(Path::new("tests/data"))?.into_os_string().into_string().unwrap(),
However something that I found out about testcontainers in Rust is that it doesn't support reusing the container for multiple tests so the integration tests would be for ex. in a single test function. More about it here. There are also workarounds. Alternative would be to use docker-compose
to spin up minio before running integration tests and spin it down after.
So this is how I'm thinking about approaching it:
./docker-compose.yaml
for spinning up required containers for MinIOcrates/tests/src/common.rs
- some utility code for ex. to create an s3 bucket, put the pre-generated tablescrates/tests/src/integration_test.rs
- spin up minio using testcontainers and run integration tests integration_test
) so they can be run separatelyI had some questions as well.
@abyssnlp sounds good to make use of docker-compose - it'll be convenient to evolve the tests as we probably need to add more components in future. to answer the questions
crates/tests
is a crate to provide all kinds of common hudi test utilities, but we don't want to host actual tests in it.Thanks for sharing your thoughts on it. Having them separate from the crates sounds good. I've started some initial work on a local branch and managed to get Minio up with the pre-generated tables.
I'm currently running into some issues trying to read the tables via hudi-rs
.
I've tried using both environment variables (AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and local MinIO specific AWS_ENDPOINT
and ALLOW_HTTP
) and providing the config via the HudiDataSource::new_with_options
as a Vec<&str, &str>
.
I can confirm the object store config works for hudi::storage::Storage
for ex. while trying to read the contents of .hoodie/hoodie.properties
.
Might be some bad configuration on my end. I'll continue working on it this week and keep posting updates here.
@abyssnlp any plan to put this up in a PR?
@xushiyan Yes i'll put it up in a PR soon (today or tomorrow).
I'll take this one if noone's assigned to it yet.