Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.18k stars 146 forks source link

`read_deltalake` attempts to use S3 credentials for local files #2879

Open apostolos-geyer opened 2 weeks ago

apostolos-geyer commented 2 weeks ago

Describe the bug A clear and concise description of what the bug is.

When attempting to read a local deltalake, daft will log multiple errors and attempt to retrieve S3 credentials, create a client for us-east-1, etc. Not sure if this is a bug or if there is some behaviour or configuration for daft to understand I'm working with local files and not to try to use S3, but I couldn't find anything about this in the docs. The file is still read successfully, but it would be nice to not have to wait for it to fail to get a session token, and attempt to create an S3 client.

To Reproduce Steps to reproduce the behavior:

import daft
daft.read_deltalake('path/to/a/local/file')

output:

failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 1s: HTTP connect timeout occurred after 1s: timed out (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 1s }), connection: Unknown } }) }))
failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 1s: HTTP connect timeout occurred after 1s: timed out (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 1s }), connection: Unknown } }) }))
S3 Credentials not provided or found when making client for us-east-1! Reverting to Anonymous mode. the credential provider was not enabled

Expected behavior A clear and concise description of what you expected to happen.

The local deltalake should be read without attempting to use S3 or any other network locations and without logging errors.

Screenshots If applicable, add screenshots to help explain your problem.

Screenshot 2024-09-21 at 2 03 36 PM

Desktop (please complete the following information):

Screenshot 2024-09-21 at 2 07 41 PM

If you guys are looking for contributors, I'd be happy to try and fix this myself. Never contributed to anything before so not sure if there's any procedures but if I can I'll give it a shot.

jaychia commented 2 weeks ago

Thanks @apostolos-geyer .... Good catch!

This definitely seems like a bug. Would LOVE to take a contribution ❤️

Here are some quick tips:

We probably will want to only do the detection of credentials (S3Config.from_env()) only if the Delta path provided is an S3 path.

jaychia commented 2 weeks ago

Feel free to shoot us any questions about contributing :)