fsspec / s3fs

S3 Filesystem
http://s3fs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
878 stars 272 forks source link

Intermittent 'PermissionError: Access Denied' when trying to read S3 file from AWS Lambda #218

Closed lavinia-k closed 4 years ago

lavinia-k commented 5 years ago

I'm running a Python 3.7 script in AWS Lambda, which runs queries against AWS Athena and tries to download the CSV results file that Athena stores on S3 once the query execution has completed.

Any ideas why I'd be running into the error below intermittently?

s3_query_result_path = f's3://{bucket}/{results_key}'

[ERROR] PermissionError: Access Denied
Traceback (most recent call last):
  File "/var/task/lambdas/process_file/lambda_function.py", line 91, in lambda_handler
    df = pd.read_csv(s3_query_result_path)
  File "/var/task/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/var/task/pandas/io/parsers.py", line 440, in _read
    filepath_or_buffer, encoding, compression
  File "/var/task/pandas/io/common.py", line 207, in get_filepath_or_buffer
    filepath_or_buffer, encoding=encoding, compression=compression, mode=mode
  File "/var/task/pandas/io/s3.py", line 36, in get_filepath_or_buffer
    filepath_or_buffer = fs.open(_strip_schema(filepath_or_buffer), mode)
  File "/var/task/fsspec/spec.py", line 669, in open
    autocommit=ac, **kwargs)
  File "/var/task/s3fs/core.py", line 303, in _open
    autocommit=autocommit)
  File "/var/task/s3fs/core.py", line 920, in __init__
    cache_type=cache_type)
  File "/var/task/fsspec/spec.py", line 864, in __init__
    self.details = fs.info(path)
  File "/var/task/s3fs/core.py", line 479, in info
    return super().info(path)
  File "/var/task/fsspec/spec.py", line 477, in info
    out = self.ls(self._parent(path), detail=True, **kwargs)
  File "/var/task/s3fs/core.py", line 497, in ls
    files = self._ls(path, refresh=refresh)
  File "/var/task/s3fs/core.py", line 430, in _ls
    return self._lsdir(path, refresh)
  File "/var/task/s3fs/core.py", line 336, in _lsdir
    raise translate_boto_error(e)

As you can see above, I'm using the pandas library, which then uses s3fs under the hood.

The Lambda works about 80% of the time, and I can't figure out anything unique about the times it fails.

Feel free to let me know if I should be posting this question in pandas or elsewhere instead - thanks for your help!

TomAugspurger commented 4 years ago

This was most likely fixed with https://github.com/intake/filesystem_spec/pull/181, which is included in fsspec 0.6.0. Let us know if anyone can reproduce with the latest versions of fsspec and s3fs.

milesgranger commented 4 years ago

This is happening to me. I too use .read/to_parquet and have: fsspec==0.6.2 s3fs==0.4.0

Step functions, running exactly the same data/process and will occasionally fail with permission denied, but nothing changes between runs. image

martindurant commented 4 years ago

I appreciate it's hard to debug on Lambda, but can you tell from logs if this is running out of retries, or if its a type of error that s3fs doesn't realise should be retried? Note that setting S3FS_LOGGING_LEVEL=DEBUG will bring you much more information.

milesgranger commented 4 years ago

This was the output, but have since switched to boto3 directly for reading. If you'd like I can revert the changes and run it again with S3FS_LOGGING_LEVEL=DEBUG if what's provided below isn't detailed enough:

{
  "error": "PermissionError",
  "cause": {
    "errorMessage": "Access Denied",
    "errorType": "PermissionError",
    "stackTrace": [
      "  File \"/var/task/leroydw/processing/common.py\", line 42, in _wrapper\n    return fn(*args, **kwargs)\n",
      "  File \"/var/task/leroydw/processing/common.py\", line 42, in _wrapper\n    return fn(*args, **kwargs)\n",
      "  File \"/var/task/leroydw/processing/bgoinnova/process.py\", line 22, in process_file\n    df = pd.read_parquet(f's3://{event[\"File\"]}', engine=\"pyarrow\")\n",
      "  File \"/tmp/sls-py-req/pandas/io/parquet.py\", line 310, in read_parquet\n    return impl.read(path, columns=columns, **kwargs)\n",
      "  File \"/tmp/sls-py-req/pandas/io/parquet.py\", line 121, in read\n    path, _, _, should_close = get_filepath_or_buffer(path)\n",
      "  File \"/tmp/sls-py-req/pandas/io/common.py\", line 185, in get_filepath_or_buffer\n    filepath_or_buffer, encoding=encoding, compression=compression, mode=mode\n",
      "  File \"/tmp/sls-py-req/pandas/io/s3.py\", line 48, in get_filepath_or_buffer\n    file, _fs = get_file_and_filesystem(filepath_or_buffer, mode=mode)\n",
      "  File \"/tmp/sls-py-req/pandas/io/s3.py\", line 38, in get_file_and_filesystem\n    file = fs.open(_strip_schema(filepath_or_buffer), mode)\n",
      "  File \"/tmp/sls-py-req/fsspec/spec.py\", line 724, in open\n    **kwargs\n",
      "  File \"/tmp/sls-py-req/s3fs/core.py\", line 315, in _open\n    autocommit=autocommit, requester_pays=requester_pays)\n",
      "  File \"/tmp/sls-py-req/s3fs/core.py\", line 957, in __init__\n    cache_type=cache_type)\n",
      "  File \"/tmp/sls-py-req/fsspec/spec.py\", line 956, in __init__\n    self.details = fs.info(path)\n",
      "  File \"/tmp/sls-py-req/s3fs/core.py\", line 486, in info\n    return super().info(path)\n",
      "  File \"/tmp/sls-py-req/fsspec/spec.py\", line 497, in info\n    out = self.ls(self._parent(path), detail=True, **kwargs)\n",
      "  File \"/tmp/sls-py-req/s3fs/core.py\", line 529, in ls\n    files = self._ls(path, refresh=refresh)\n",
      "  File \"/tmp/sls-py-req/s3fs/core.py\", line 426, in _ls\n    return self._lsdir(path, refresh)\n",
      "  File \"/tmp/sls-py-req/s3fs/core.py\", line 349, in _lsdir\n    raise translate_boto_error(e)\n"
    ]
  }
}
martindurant commented 4 years ago

I'm afraid that stack trace is not particularly useful. boto alone had no problems? It may be worth trying with s3fs master, where the parent dir should not be listed for this case.

milesgranger commented 4 years ago

Didn't think it would be, sorry. Using boto has ran 6 times in a row without incident. Typically, the best with s3fs was ~4 times before this failure would happen. I'll try master and let you know.

milesgranger commented 4 years ago

I haven't been able to reproduce it using master. I'm no expert in the code base, but I don't really see any changes since 0.4.0 that would affect this, maybe it's a Heisenberg... :man_shrugging: Anyway, thanks for the help, I'll report back if something similar arises. :+1:

martindurant commented 4 years ago

Heisenberg it is

GurRonenExplorium commented 4 years ago

hey, reproduced it with my use, using these dependencies

boto3==1.12.28
botocore==1.15.28
dask[dataframe]==2.12.0
docutils==0.15.2
fastparquet==0.3.3
fsspec==0.6.3
jmespath==0.9.5
llvmlite==0.31.0
locket==0.2.0
numba==0.48.0
numpy==1.18.2
pandas==1.0.3
partd==1.1.0
python-dateutil==2.8.1
pytz==2019.3
s3fs==0.4.0
s3transfer==0.3.3
six==1.14.0
thrift==0.13.0
toolz==0.10.0
urllib3==1.25.8 ; python_version != '3.4'

Error wise i just get

[ERROR] PermissionError: Access Denied
GurRonenExplorium commented 4 years ago

Usually I will have a large burst of small files with many concurrent lambdas, so probably reusing the same lambda context all the time. let me know if there are any other logs that i can bring to help fix this

GurRonenExplorium commented 4 years ago

Downgraded to s3fs 0.2.0, can confirm it works well.

dependencies used that definitely work:

boto3==1.12.31
botocore==1.15.31
cloudpickle==1.3.0
dask[dataframe]==2.1.0
docutils==0.15.2
fastparquet==0.3.3
jmespath==0.9.5
llvmlite==0.31.0
locket==0.2.0
numba==0.48.0
numpy==1.18.2
pandas==0.25.3
partd==1.1.0
python-dateutil==2.8.1
pytz==2019.3
s3fs==0.2.0
s3transfer==0.3.3
six==1.14.0
thrift==0.13.0
toolz==0.10.0
urllib3==1.25.8 ; python_version != '3.4'
martindurant commented 4 years ago

@GurRonenExplorium , it would be very helpful for us if you could do additional logging/delving, to find out if any of the calls were different during the reads which failed.

ericman93 commented 4 years ago

I have the same issue but on EC2 and not lambda In Jupyter I tried to read a file created after the notebook was generated (the file was created in another server) and it failed, but after I restarted the kernel - it worked

I guess it is related to this s3fs cache https://github.com/dask/dask/issues/5134

upgrading s3fs to 0.4.2 solved the issue

adminy commented 1 year ago

This is still happening on s3fs==2023.1.0 Here are some logs:

[ERROR] PermissionError: Access Denied
Traceback (most recent call last):
  File "/var/task/lambda_handler.py", line 113, in magic_function
    results = pd.read_parquet(s3_valid_path)
  File "/var/lang/lib/python3.7/site-packages/pandas/io/parquet.py", line 317, in read_parquet
    return impl.read(path, columns=columns, **kwargs)
  File "/var/lang/lib/python3.7/site-packages/pandas/io/parquet.py", line 142, in read
    path, columns=columns, filesystem=fs, **kwargs
  File "/var/lang/lib/python3.7/site-packages/pyarrow/parquet/core.py", line 2952, in read_table
    thrift_container_size_limit=thrift_container_size_limit,
  File "/var/lang/lib/python3.7/site-packages/pyarrow/parquet/core.py", line 2465, in __init__
    finfo = filesystem.get_file_info(path_or_paths)
  File "pyarrow/_fs.pyx", line 571, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/_fs.pyx", line 1490, in pyarrow._fs._cb_get_file_info
  File "/var/lang/lib/python3.7/site-packages/pyarrow/fs.py", line 332, in get_file_info
    info = self.fs.info(path)
  File "/var/lang/lib/python3.7/site-packages/fsspec/asyn.py", line 114, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/var/lang/lib/python3.7/site-packages/fsspec/asyn.py", line 99, in sync
    raise return_result
  File "/var/lang/lib/python3.7/site-packages/fsspec/asyn.py", line 54, in _runner
    result[0] = await coro
  File "/var/lang/lib/python3.7/site-packages/s3fs/core.py", line 1271, in _info
    **self.req_kw,
  File "/var/lang/lib/python3.7/site-packages/s3fs/core.py", line 340, in _call_s3
    method, kwargs=additional_kwargs, retries=self.retries
  File "/var/lang/lib/python3.7/site-packages/s3fs/core.py", line 139, in _error_wrapper
    raise err
martindurant commented 1 year ago

@adminy , we are now onto 2023.5.0 - do you mind trying again? You might also try activating the region cache (cache_regions=True) or setting the default region via environment variable or the client_kwargs.

adminy commented 1 year ago

I can't update. There is a strict requirement to use python 3.7 at the monent.

there is a Region environment variable set in the lambda.