delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.97k stars 365 forks source link

DeltaTable over S3: creation hangs up when being run from current thread scheduler #2590

Open zxqfd555-pw opened 2 weeks ago

zxqfd555-pw commented 2 weeks ago

Environment

Delta-rs version: 0.17.3

Binding: Rust

Environment:


Bug

What happened:

I have a code that does a DeltaTable creation. When running it with a current thread scheduler in Tokio, it hangs indefinitely, and supposedly deadlocks.

This issue can be fixed by changing the scheduler to a multi-thread scheduler (practically, new_multi_thread instead of new_current_thread), which works fine. However, the hangup when using the current thread scheduler is strange and takes time to locate and debug. Could you please clarify why it happens? If the library is not supposed to work with a current thread scheduler stably, perhaps, an explicit panic! can be more helpful than a silent hangup.

How to reproduce it:

Below is the minimum example to reproduce the hangup:

    // Construct connection options
    let mut options = HashMap::new();
    // Store values for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, AWS_S3_ALLOW_UNSAFE_RENAME in `options`

    // Register handlers, so that the library understands s3:// and s3a:// path schemas
    deltalake::aws::register_handlers(None);

    // Create some schema just so that we have something to provide
    let mut struct_fields = vec![DeltaTableStructField::new(
        "data",
        DeltaTableKernelType::Primitive(DeltaTablePrimitiveType::Long),
        false,
    )];

    // Initialize table creation
    let builder = DeltaTableCreateBuilder::new()
        .with_location(table_uri)
        .with_columns(struct_fields)
        .with_storage_options(options.clone());

    // Await the result
    let dt = tokio::runtime::Builder::new_current_thread()
        .enable_all()
        .build()
        .unwrap()
        .block_on(
            async {
                builder.await
            }
        );

The code above hangs up, while if the scheduler in the last block is changed to new_multi_thread, it works normally.

More details:

I've taken the stack trace of the hanged process, and it points to this thread spawning. The line number was different by one or two, and I suspect it's because some more code was committed after the release.

Then I checked out the repo and rebuilt my example specifying the path to the package pointing specifically to the checked-out code so that I could make changes easily and test it. But with the freshest code, the issue doesn't reproduce - perhaps just a library release is needed then if something was changed in the S3 management.