Closed phofl closed 7 months ago
When running this from my machine, this throws FAILED tests/tpch/test_polars.py::test_query_1 - polars.exceptions.ComputeError: Generic S3 error: Client error with status 403 Forbidden: No Body
.
@ritchie46
We are a bit confused, the following doesn't work for us:
import polars as pl
import boto3
session = boto3.session.Session()
credentials = session.get_credentials()
pl.scan_parquet(
"s3://coiled-runtime-ci/tpc-h/snappy/scale-1000/lineitem/*.parquet",
storage_options={
"aws_access_key_id": credentials.access_key,
"aws_secret_access_key": credentials.secret_key,
"aws_region": "us-east-2",
},
)
Traceback (most recent call last):
File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2023.3/scratches/dask_expr_scratch.py", line 186, in <module>
pl.scan_parquet(
File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/polars/utils/deprecation.py", line 136, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/polars/utils/deprecation.py", line 136, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/polars/io/parquet/functions.py", line 311, in scan_parquet
return pl.LazyFrame._scan_parquet(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 464, in _scan_parquet
self._ldf = PyLazyFrame.new_from_parquet(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: Generic S3 error: Error performing list request: Client error with status 403 Forbidden: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>***</AWSAccessKeyId><RequestId>NZEXF3ABC6078JG2</RequestId><HostId>***</HostId></Error>
2 problems:
scan_parquet
docstringAm I doing anything wrong, e.g. missing a variable in storage options or something similar?
The Polars error message prints the access key and secret (I replaced it with * here), that's not great from a security perspective
Hmm.. No it isn't. Will see if this can be fixed upstream in Object-store
(which is what we use for s3 access).
That seems strange. It must be the credentials though. I can access private s3 repos.
These are the config keys we support: https://docs.rs/object_store/0.9.0/object_store/aws/enum.AmazonS3ConfigKey.html
Could you also set POLARS_VERBOSE=1
? That might show a bit more.
@phofl do you need to pass aws session token as well? (if you're using your standard coiled employee aws creds, I think it's likely you do)
Yep adding session token solved this problem, thx!
Sorry for the noise @ritchie46, I can now access the files, so it seems to work.
Then there is only the secret issue, but that should be covered by the issue that you've opened, thx for that
Yeah, looking into that. I believe the client id isn't really secret.
Yikes...