Open anjali-chadha opened 8 months ago
Are you using AWS or do you set endpoint_override
? "Couldn't resolve host name" probably means you're having issues with DNS resolution of the S3 server(s) hostnames, which is quite unexpected with AWS...
@pitrou Yes, we are using AWS, and not explicitly overriding endpoint_override
you're having issues with DNS resolution of the S3 server(s) hostnames, which is quite unexpected with AWS...
We've only encountered this problem once among our numerous runs, so it's not a frequent occurrence. Currently, our approach to dealing with this issue is by increasing the default number of S3 retry attempts from 3 to a higher value.
if isinstance(fs, S3FileSystem):
fs = pa.fs.S3FileSystem(
region=fs.region, retry_strategy=AwsStandardS3RetryStrategy(max_attempts=6)
)
However, we're uncertain if this is the most effective approach.
Do you have any recommendations on how we can better handle this on the client side?
Sorry, I don't have any recommandation. If increasing max_attempts
works, then it seems ok to me.
I'm facing this issue when doing a scan_iceberg operation in Polars. It only happens with certain objects:
When reading information for key '
Describe the bug, including details regarding any error messages, version, and platform.
Hi there!
We are using the PyArrow library to read files from an S3 bucket, and we're encountering an intermittent error:
OSError: When reading information for key '<REDACTED>' in bucket '<REDACTED>': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 6, Couldn't resolve host name
Please note that this error doesn't occur consistently, and the S3 bucket path is valid.
The reference code we're using is as follows:
Error Details:
Could you please provide any suggestions on how to handle such intermittent network connectivity errors while reading from S3?
Component(s)
Parquet, Python