delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

Getting "Microsoft Azure Error: Operation timed out" when trying to retrieve big files #2537

Closed erickfigueiredo closed 1 month ago

erickfigueiredo commented 1 month ago

Environment

Delta-rs version: 0.16.0

Environment:


Bug

What happened: I encountered the error OSError: MicrosoftAzure Generic Error: Error in request or response body: Operation timed out when attempting to load data containing millions of rows from a Delta Table (using Azure DataLake Gen 2). This error does not occur when I try to retrieve a smaller amount of data.

What you expected to happen: I expected to load the data from the Delta table and convert it to a Pandas DataFrame without any errors.

How to reproduce it:

from deltalake import DeltaTable

credentials = {
  'account_name': '<account_name>',
  'client_id': '<client_id>',
  'tenant_id': '<tenant_id>',
  'client_secret': '<client_secret>'
}

# Load data from the delta table
dt = DeltaTable("abfs://<container>/<path>", storage_options=credentials)
df = dt.to_pandas()

Captura de tela 2024-05-23 155639

ion-elgreco commented 1 month ago

You can increase the timeout by doing this: storage_options={"timeout": "100s"}

erickfigueiredo commented 1 month ago

Thank you very much, @ion-elgreco ! It's working now! Is there any documentation on all possible parameters that can be used in the storage_options parameter? It would be very useful in my context!

ion-elgreco commented 1 month ago

@erickfigueiredo you can see most config keys here: https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html