Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 586 forks source link

BotoCore Timeouts #1467

Open jroakes opened 7 years ago

jroakes commented 7 years ago

This may be able to be solved in another manner, but was wondering if it would make sense to include a connect and read timeout parameter into the mrjob.conf since the default is 60 and AWS recommends 70.

See: https://github.com/boto/botocore/pull/634

I am fairly new to mrjob so please excuse my ignorance if this is a non-issue or if another workflow exists to handle this. Discovered after researching some issues with consistency of connection to S3.

coyotemarin commented 7 years ago

mrjob is somewhat embarassingly still on boto 2. We actually have our own solution for making our boto connections super-robust against timeouts (see https://github.com/Yelp/mrjob/blob/master/mrjob/fs/s3.py#L49).

When we do switch to boto 3/botocore (see #1304), I'll keep the 70-second recommendation in mind, though whatever we do will probably also try again (and again, and again...) after that timeout. :)

jroakes commented 7 years ago

Here is a relevant error:

  File "/usr/local/lib/python2.7/site-packages/botocore/response.py", line 74, in read
    chunk = self._raw_stream.read(amt)
  File "/usr/local/lib/python2.7/site-packages/botocore/vendored/requests/packages/urllib3/response.py", line 267, in read
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
botocore.vendored.requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='aws-publicdatasets.s3.amazonaws.com', port=443): Read timed out.
coyotemarin commented 7 years ago

Thanks!

coyotemarin commented 6 years ago

Is this still an issue? mrjob is currently on botocore 1.6.0+, and it seems to be fairly robust about dealing with transient errors.