apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.52k stars 749 forks source link

Cleanup Empty Directories in LocalFilesystem #5976

Closed thinkharderdev closed 3 months ago

thinkharderdev commented 3 months ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently the way to delete a "directory" through the ObjectStore interface is to call list with the directory path as prefix and then pipe that to delete_stream.

This is the correct way to do this for object storage since "directories" don't really exist and the path is just a prefix to individual objects.

But for actual filesystems directories are actual inodes themselves in the filesystem which should be cleaned up.

Describe the solution you'd like

Add a new method to ObjectStore: async fn delete_prefix(&self, prefix: Option<&Path>) -> Result<()>

This default implementation for this method can just be calling list(prefix) and then piping that to delete_stream but in the case of LocalFilesystem (and potentially other implementations in the future) will be specialized to also delete directories

Describe alternatives you've considered

This could be handled inside LocalFilesystem during delete_stream but could become complicated. The implementation would have to:

  1. Track prefixes during the delete operation
  2. After delete determine if any of the prefixes are now empty
  3. Delete them

Additional context

tustvold commented 3 months ago

My personal preference would be an option for empty directories to be automatically cleaned up, as this is closer to the semantics of object stores, and should be a relatively straightforward case of calling std::fs::remove_dir on each parent until it errors.

Alternatively we could allow LocalFilesystem::delete to delete directories if that is what the path points to, I'm a little more lukewarm on this as it sort of leaks out of the abstraction.

I am not a fan of a delete_prefix method that has subtely divergent behaviour from standard deletion, I suspect users will find this surprising

Edit: I've edited the issue title to reflect the desired use-case, not the proposed solution

fsdvh commented 3 months ago

I think we can do it automatically by changing delete code for local fs a bit:

async fn delete(&self, location: &Path) -> Result<()> {
        let config = Arc::clone(&self.config);
        let path = self.path_to_filesystem(location)?;
        maybe_spawn_blocking(move || {
            if let Err(e) = std::fs::remove_file(&path) {
                Err(match e.kind() {
                    ErrorKind::NotFound => Error::NotFound { path, source: e }.into(),
                    _ => Error::UnableToDeleteFile { path, source: e }.into(),
                })
            } else {
                let root = config.root.to_file_path().unwrap();
                let mut parent = path.parent();

                while let Some(loc) = parent {
                    if loc != root && std::fs::remove_dir(loc).is_ok() {
                        parent = loc.parent();
                    } else {
                        break;
                    }
                }

                Ok(())
            }
        })
        .await
    }
tustvold commented 3 months ago

We should definitely make this opt-in to avoid surprises, but ^ sounds good to me

thinkharderdev commented 3 months ago

I'm perfectly fine with doing this internal to LocalFilesystem to minimize changes but to make the argument for the new method:

  1. It is a useful shorthand for an operation that should be reasonably common even with non-filesystem implementations.
  2. It can be formulated in terms of the default implementation of existing methods so won't break anything
  3. Handling it internal to LocalFilesystem requires doing a bunch of unnecessary filesystem operations (and their associated syscalls) since we have to try and delete the parent directory after deleting each file.
tustvold commented 3 months ago

Perhaps we could quantify the performance impact of the additional syscalls? I don't disagree that the operation is relatively common, but I also think consistency is very important, if only some methods cleanup directories it makes for a confused UX IMO.

Ultimately LocalFilesystem is not aiming to be the fastest filesystem implementation, for that you want to be using OS specific abstractions like io_uring or direct IO, its goal is to provide good enough performance whilst mirroring object store behaviour

fsdvh commented 3 months ago

I think it's good to have such a short-cut for list + deleted as additionally to handling empty dirs it can be more efficient in some implementation

fsdvh commented 3 months ago

Also, we can keep both approaches together without too much harm to the UX

thinkharderdev commented 3 months ago

if only some methods cleanup directories it makes for a confused UX IMO

IMO the internalized implementation is more confusing UX-wise. I would not necessarily expect deleting a file to also delete its parent directory which is what would happen. And while deleting the empty directories is the concrete issue we are facing, I think the ability to to recursively delete objects under a path makes sense as discrete operation on its own terms.

tustvold commented 3 months ago

Given you clearly feel more strongly on the matter than I do, let's just add a delete_all method with a default implementation using list and delete_stream.

fsdvh commented 3 months ago

Added suggested changes in a #5975

crepererum commented 3 months ago

IMHO, we're talking about two different things here:

  1. prefix deletion: Should we offere an API for ALL implementations (with fallback) that allows quick prefix-based deletion. I think the question is mostly if any cloud stored actually support this. For the local store, this is mostly irrelevant (see next point as well) because under the hood a prefix-deletion is the same as "list prefix"+"delete" (at least under Unix you cannot just delete a directory, you have to empty it first).
  2. directory management for file-based object store: As others pointed out, directories are only an auxiliary data structure for file-based object stores. They have zero meaning for the key-value semantic of the object store. In theory, we could encode the file-based keys in a way that we don't need directories at all, but most people would find this confusing and some operations like "list prefix" would be considerably slower. That said, I think the object store should by default (maybe via opt-out) clean up the auxiliary structures it creates. Sure that's expensive, but to implement an FS-based store correctly, you need a bunch of syscalls anyways (e.g. to prevent the creation of partially written files) and tokio isn't the most efficient way to do file-based IO. So if you use this store and you have that many files that the per-file overhead is too large, I think you lost this game even prior to the implementation of the clean-up logic.
alamb commented 3 months ago

@fsdvh has a nice PR here to add optional "clean empty directories for local filesystem": https://github.com/apache/arrow-rs/pull/5978

I think this reflects the consensus of discussion on this ticket. Please take a look when you have time.