Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.35k stars 166 forks source link

Extend `Expression.url.upload()` to support row-specific URLs using an input column instead of a single `prefix` #3320

Open NellyWhads opened 6 days ago

NellyWhads commented 6 days ago

Is your feature request related to a problem?

Currently, users can only upload to a single prefix (local directory, or S3).

The feature request is to extent the upload functionality to work on a column expression.

Describe the solution you'd like

I would like to be able to run this example

df = df.with_column("uploaded_url", df["foo"].url.upload(df["target_urls"]))

This is expected to produce a column named "uploaded_url" which contains paths to successfully uploaded data or null to indicate failure. This would allow API consistency with the download() method, which allows users to optionally raise errors in an eager manner, or ignore them and report a null value from the expression.

Describe alternatives you've considered

I currently maintain a custom StatefulUDF which does the same thing, however, the small API change would remove the need for this solution.

Additional Context

Slack Thread: https://dist-data.slack.com/archives/C041NA2RBFD/p1731900336643709

Would you like to implement a fix?

No

desmondcheongzx commented 4 days ago

FYI aiming to find some time to add this extension towards the end of the week