GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
864 stars 331 forks source link

gsutil runs into socket timeout with -m options #1739

Open MichaelJThomas-2016 opened 10 months ago

MichaelJThomas-2016 commented 10 months ago

Hi,

I am trying to rsync a bucket from gcs -> aws via gsutil.

I am using composer to schedule a bash script that runs:

    set -e; 
    export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
    export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}; 
    sudo apt-get update -y && sudo apt-get install google-cloud-cli -y # This in itself is an issue with the python runtime on GKE
    gsutil -o "GSUtil:max_upload_compression_buffer_size=8G" -m rsync -r  gs://MY-BUCKET/MY_PREFIX/year={{execution_date.year}}/month={{execution_date.strftime('%m')}}/day={{execution_date.strftime('%d')}} \
    s3://MY-BUCKET/MY_PREFIX/year={{execution_date.year}}/month={{execution_date.strftime('%m')}}/day={{execution_date.strftime('%d')}}

If I remove the -m option, composer fails out - An issue i should ask them about - but a few files upload. If I leave the -m I get:

[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - Traceback (most recent call last):
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/threading.py", line 980, in _bootstrap_inner
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     self.run()
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/threading.py", line 917, in run
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     self._target(*self._args, **self._kwargs)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/daisy_chain_wrapper.py", line 189, in PerformDownload
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     self.gsutil_api.GetObjectMedia(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 352, in GetObjectMedia
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._GetApi(provider).GetObjectMedia(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1244, in GetObjectMedia
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._PerformDownload(bucket_name,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1383, in _PerformDownload
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     apitools_download.GetRange(additional_headers=additional_headers,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 485, in GetRange
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     response = self.__GetChunk(progress, end_byte,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 418, in __GetChunk
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return http_wrapper.MakeRequest(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 359, in MakeRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     retry_func(ExceptionRetryArgs(http, http_request, e, retry,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/retry_util.py", line 84, in RetriesInDataTransferHandler
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     http_wrapper.RethrowExceptionHandler(retry_args)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 348, in MakeRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return _MakeRequestNoRetry(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 397, in _MakeRequestNoRetry
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     info, content = http.request(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 544, in NewRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return request_orig(uri, method=method, body=body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/oauth2client/oauth2client/transport.py", line 173, in new_request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     resp, content = request(orig_request_method, uri, method, body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/oauth2client/oauth2client/transport.py", line 280, in request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return http_callable(uri, method=method, body=body, headers=headers,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/httplib2/python3/httplib2/__init__.py", line 1701, in request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     (response, content) = self._request(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 452, in OverrideRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     (response, content) = self._conn_request(conn, request_uri, method, body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 685, in _conn_request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     new_data = http_stream.read(TRANSFER_BUFFER_SIZE)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 403, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     data = orig_read_func(amt)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/http/client.py", line 463, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     n = self.readinto(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/http/client.py", line 507, in readinto
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     n = self.fp.readinto(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/socket.py", line 704, in readinto
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._sock.recv_into(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/ssl.py", line 1242, in recv_into
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self.read(nbytes, buffer)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/ssl.py", line 1100, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._sslobj.read(len, buffer)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - socket.timeout: The read operation timed out
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - The read operation timed out

Not exactly sure if its on the AWS end or not, but any help would be appreciated.