Open kevinzwang opened 4 months ago
I want to pass this option, but i don't know how to do it
storage_options={"allow_unsafe_rename":"true"}
@djouallah Looks like allow_unsafe_rename
is an option that is used by delta-rs rather than object store.
A workaround should be to set
export AWS_S3_ALLOW_UNSAFE_RENAME=true
source: https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/
@djouallah Looks like
allow_unsafe_rename
is an option that is used by delta-rs rather than object store. A workaround should be to setexport AWS_S3_ALLOW_UNSAFE_RENAME=true
source: https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/
yes, but how to do it in daft, that was my question ?
@djouallah Looks like
allow_unsafe_rename
is an option that is used by delta-rs rather than object store. A workaround should be to setexport AWS_S3_ALLOW_UNSAFE_RENAME=true
source: https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/
yes, but how to do it in daft, that was my question ?
This isn't a Daft-specific configuration! It's actually from delta-rs
, and isn't actually an object_store configuration either. You can just set the environment variable like so in your program, which will correctly configure delta-rs
.
export AWS_S3_ALLOW_UNSAFE_RENAME=true
no luck in a notebook :(
OSError: Generic LocalFileSystem error: Unable to copy file from /synfs/lakehouse/default/Tables/T10/daft/_delta_log/_commit_c475e751-6256-4777-8fa7-fc8f1704d785.json.tmp to /synfs/lakehouse/default/Tables/T10/daft/_delta_log/00000000000000000000.json: Function not implemented (os error 38)
@jaychia @kevinzwang Let's expose an option to allow allow_unsafe_rename
. I dug through the delta-rs code and it looks like they overload allow_unsafe_rename
to do both AWS_S3_ALLOW_UNSAFE_RENAME
for S3 and an allow path for other filesystems.
@jaychia I think this the codepath that is getting hit when allow_unsafe_rename
is set and the object store is mounted locally.
LOL it seems like they reused the key allow_unsafe_rename
for both s3 and mount filesystem
Yeah we can definitely add this. First @djouallah could you try setting export MOUNT_ALLOW_UNSAFE_RENAME=true
fixes the error you saw?
it is working and it is freaking fast !!! interesting,
question, how do I do partition by , and is there a way to control the file size, it seems daft generate really small file 15 mb
edit : it works fine in delta_rs 0.17.4 but not 0.18.2
@djouallah we do not yet have the ability to do partitioned writes, but we are working on it! As for file sizes, maybe we can expose a config parameter for that, I'll take a look.
edit : it works fine in delta_rs 0.17.4 but not 0.18.2
Do you see a specific error with 0.18.2, or does it just have the same behavior as when MOUNT_ALLOW_UNSAFE_RENAME
is not set?
Several other libraries pass around a storage options dictionary that is then used by the
object_store
Rust crate to authenticate and do reads and writes. To allow users to more easily move to Daft, we could provide a functionality for them to use their storage options in Daft.There are two ways to do this:
storage_options_to_io_config(options: dict[str, str]) -> IOConfig
which does this conversion. One thing to figure out about this is that we would need to know what cloud provider they are using, since storage option values between cloud providers are not disjoint.storage_options
wherever they can passio_config
. In this case we can usually infer the cloud provider so it would probably be a cleaner API, but that would make it harder for users to take advantage of authentication flows that we have butobject_store
doesn't.Another thing to consider is if we wanted to use the mappings in the
object_store
crate, which would require dipping into the Rust layer, or to copy the mappings into our own code