drivendataorg / cloudpathlib

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.
https://cloudpathlib.drivendata.org
MIT License
472 stars 59 forks source link

Allow ignore_errors flag for CloudPath().rmtree() #270

Open MichaelTurekCGI opened 2 years ago

MichaelTurekCGI commented 2 years ago

Similar to the shutil.rmtree() method, allow the ignore_errors flag to be set for CloudPath instances.

pjbull commented 2 years ago

We'd accept a PR to match the shutil signature. Thanks!

(Also, curious what errors you are hitting since we're pretty diligent about existence checks throughout).

aaossa commented 2 years ago

I guess that any error that can be raised by self.client._remove (maybe CloudPathException?) wil appear here. Python does catch any Exception in some parts and OSError on others (according to the action that is being performed), and applies the onerror callback (another optional argument for shutil.rmtree). Relevant Python docs here.

Again, we could take inspiration from the shutil source: https://github.com/python/cpython/blob/main/Lib/shutil.py#L690-L759

-    def rmtree(self):
+    def rmtree(self, ignore_errors: bool = False, onerror=None):
        """Delete an entire directory tree."""
+       if ignore_errors:
+           def onerror(*args):
+               pass
+       elif onerror is None:
+           def onerror(*args):
+               raise
        if self.is_file():
            raise CloudPathNotADirectoryError(
                f"Path {self} is a file; call unlink instead of rmtree."
            )
-       self.client._remove(self)
+       try:
+             self.client._remove(self)
+       except Exception:
+             onerror(self.__class__.rmtree, self, sys.exc_info())
+             return

According to the Python docs:

If onerror is provided, it must be a callable that accepts three parameters: function, path, and excinfo. The first parameter, function, is the function which raised the exception; it depends on the platform and implementation. The second parameter, path, will be the path name passed to function. The third parameter, excinfo, will be the exception information returned by sys.exc_info().

My only concern here is which method and object pass to onerror. The first parameter will receive the second argument (a path), so I figured that the best way would be to pass the class method as the first argument and the instance itself to that class method, since Class.method(instance) is equivalent to instance.method() (when instance is an instance of Class).

aaossa commented 2 years ago

Of course, to test this we would need an example that actually raises the exception, but there is already a check to catch files. I'm not sure how to test this since there are many existence checks already present 🤔

EDIT: Maybe create a folder with a file in the folder, remove both manually (rmdir and unlink) and then use rmtree on the now non-existent path? EDIT 2: Does not work as a test for this case