apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
590 stars 218 forks source link

Make s3.request_timeout configurable #1568

Closed metadaddy closed 1 week ago

metadaddy commented 2 weeks ago

Similarly to #218, we see occasional timeout errors when writing data to S3-compatible object storage:

When uploading part for key 'drivestats/data/date_month=2014-08/00000-0-9c7baab5-af18-4558-ae10-1678aa90b6a5.parquet' in bucket 'drivestats-iceberg': AWS Error NETWORK_CONNECTION during UploadPart operation: curlCode: 28, Timeout was reached

[I don't believe the issue is specific to the fact that I'm using Backblaze B2 rather than Amazon S3 - I saw references to similar error messages with the latter as I was researching this issue.]

The issue happens when the underlying PUT operation takes longer than the request timeout, which is set to a default of 3 seconds in the AWS C++ SDK used by Arrow via PyArrow.

The changes in this PR allow configuration of s3.request_timeout when working directly or indirectly with pyiceberg.io.pyarrow.PyArrowFileIO, just as #218 allowed configuration of s3.connect_timeout.

For example, when creating a catalog:

catalog = load_catalog(
    "docs",
    **{
        "uri": "http://127.0.0.1:8181",
        "s3.endpoint": "http://127.0.0.1:9000",
        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
        "s3.access-key-id": "admin",
        "s3.secret-access-key": "password",
        "s3.request-timeout": 5.0,
        "s3.connect-timeout": 20.0,
    }
)
metadaddy commented 1 week ago

Hi @Fokko - I implemented and pushed your suggested correction. Thanks!

kevinjqliu commented 1 week ago

Looks like theres a lint issue, can you make make lint locally? @metadaddy

metadaddy commented 1 week ago

@kevinjqliu Ah - it wanted imports in alphabetical order - I'd just inserted S3_REQUEST_TIMEOUT immediately after S3_CONNECT_TIMEOUT. All fixed now!

Fokko commented 1 week ago

Thanks for working on this @metadaddy, and thanks @kevinjqliu for the review 🙌