fsspec / s3fs

S3 Filesystem
http://s3fs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
892 stars 274 forks source link

Writing to a cross-environment bucket requires CreateBucket permission #616

Open Mahdi-Hosseinali opened 2 years ago

Mahdi-Hosseinali commented 2 years ago

similar to #451, writing to a cross-account bucket tries to create the bucket which leads to an Access Denied error when the process doesn't have a CreateBucket permission. Because the least privilege is the best practice, this can happen in production for most of the cross-account s3 access. here's the stack trace:

>>> import pandas as pd
>>> import boto3
>>> import s3fs
>>> pd.__version__
'1.3.5'
>>> boto3.__version__
'1.21.21'
>>> s3fs.__version__
'2022.3.0'
>>> df = pd.DataFrame(range(10))
>>> df.to_csv('s3://d1g1t-dataloader-us/test1.txt', storage_options={"s3_additional_kwargs": {"ACL": "bucket-owner-full-control"}})
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 282, in _call_s3
    out = await method(**additional_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/aiobotocore/client.py", line 228, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateBucket operation: Access Denied

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 3466, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/formats/format.py", line 1105, in to_csv
    csv_formatter.save()
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/formats/csvs.py", line 237, in save
    with get_handle(
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/common.py", line 609, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/common.py", line 369, in _get_filepath_or_buffer
    file_obj = fsspec.open(
  File "/usr/local/lib/python3.8/dist-packages/fsspec/core.py", line 462, in open
    return open_files(
  File "/usr/local/lib/python3.8/dist-packages/fsspec/core.py", line 305, in open_files
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/usr/local/lib/python3.8/dist-packages/fsspec/core.py", line 305, in <listcomp>
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 85, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 65, in sync
    raise return_result
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 767, in _makedirs
    await self._mkdir(path, create_parents=True)
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 752, in _mkdir
    await self._call_s3("create_bucket", **params)
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 302, in _call_s3
    raise err
PermissionError: Access Denied
martindurant commented 2 years ago

Sorry, what does "cross-environment" mean?

Mahdi-Hosseinali commented 2 years ago

Sorry for the bad terminology, it is cross-account permission, the destination account has a role that allows the other one to put objects on s3 but not to create buckets. I updated the issue.

martindurant commented 2 years ago

In our example, then, does the target bucket already exist?

Mahdi-Hosseinali commented 2 years ago

Yes, the target bucket already exists.

nathalier commented 1 year ago

Having the same issue

martindurant commented 1 year ago

OK, so certainly s3 mkdirs should not fail when the bucket exists. The problem is not having a foolproof way to know if a bucket exists, because that's yet another permission you might not have. However, we can know if

So we could certainly check the first two cached locations, and we could also ignore errors in mkdirs when it is called as a precursor to a write operation.

corbt commented 4 months ago

Just want to gently bump this issue—it's what's preventing us from adopting s3fs across our app.

martindurant commented 4 months ago

Would you care to try with this change

--- a/s3fs/core.py
+++ b/s3fs/core.py
@@ -900,7 +900,7 @@ class S3FileSystem(AsyncFileSystem):
         if not path:
             raise ValueError
         bucket, key, _ = self.split_path(path)
-        if await self._exists(bucket):
+        if any(name.split("/", 1)[0] == bucket for name in self.dircache) or await self._exists(bucket):
             if not key:
                 # requested to create bucket, but bucket already exist
                 raise FileExistsError
@@ -935,7 +935,9 @@ class S3FileSystem(AsyncFileSystem):
     async def _makedirs(self, path, exist_ok=False):
         try:
             await self._mkdir(path, create_parents=True)
-        except FileExistsError:
+        except (FileExistsError, PermissionError):
+            # PermissionError here allows for cases where user can only see some
+            # of the bucket in question
             if exist_ok:
                 pass
martindurant commented 3 months ago

ping on the thread: I have some little change above that might alleviate the situation. I am not easily able to test it, so I would appreciate if the other participants would give it a go.