Open jroakes opened 7 years ago
mrjob is somewhat embarassingly still on boto 2. We actually have our own solution for making our boto connections super-robust against timeouts (see https://github.com/Yelp/mrjob/blob/master/mrjob/fs/s3.py#L49).
When we do switch to boto 3/botocore (see #1304), I'll keep the 70-second recommendation in mind, though whatever we do will probably also try again (and again, and again...) after that timeout. :)
Here is a relevant error:
File "/usr/local/lib/python2.7/site-packages/botocore/response.py", line 74, in read
chunk = self._raw_stream.read(amt)
File "/usr/local/lib/python2.7/site-packages/botocore/vendored/requests/packages/urllib3/response.py", line 267, in read
raise ReadTimeoutError(self._pool, None, 'Read timed out.')
botocore.vendored.requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='aws-publicdatasets.s3.amazonaws.com', port=443): Read timed out.
Thanks!
Is this still an issue? mrjob is currently on botocore 1.6.0+, and it seems to be fairly robust about dealing with transient errors.
This may be able to be solved in another manner, but was wondering if it would make sense to include a connect and read timeout parameter into the mrjob.conf since the default is 60 and AWS recommends 70.
See: https://github.com/boto/botocore/pull/634
I am fairly new to mrjob so please excuse my ignorance if this is a non-issue or if another workflow exists to handle this. Discovered after researching some issues with consistency of connection to S3.