Closed metadaddy closed 1 week ago
Hi @Fokko - I implemented and pushed your suggested correction. Thanks!
Looks like theres a lint issue, can you make make lint
locally? @metadaddy
@kevinjqliu Ah - it wanted imports in alphabetical order - I'd just inserted S3_REQUEST_TIMEOUT
immediately after S3_CONNECT_TIMEOUT
. All fixed now!
Thanks for working on this @metadaddy, and thanks @kevinjqliu for the review 🙌
Similarly to #218, we see occasional timeout errors when writing data to S3-compatible object storage:
[I don't believe the issue is specific to the fact that I'm using Backblaze B2 rather than Amazon S3 - I saw references to similar error messages with the latter as I was researching this issue.]
The issue happens when the underlying
PUT
operation takes longer than the request timeout, which is set to a default of 3 seconds in the AWS C++ SDK used by Arrow via PyArrow.The changes in this PR allow configuration of
s3.request_timeout
when working directly or indirectly withpyiceberg.io.pyarrow.PyArrowFileIO
, just as #218 allowed configuration ofs3.connect_timeout
.For example, when creating a catalog: