GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
875 stars 334 forks source link

'ForkAwareLocal' object has no attribute 'connection' for multithreaded cp #1100

Open djc opened 4 years ago

djc commented 4 years ago

When running gsutil -m cp -r gs://example/ ./ on a fairly large folder on macOS with the system Python 3.7 (to prevent the issues from #961), I see many instances of the following error:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 788, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2348, in run
    cls = copy.copy(class_map[caller_id])
  File "<string>", line 2, in __getitem__
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 792, in _callmethod
    self._connect()
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 779, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 492, in Client
    c = SocketClient(address)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused

As a result the command seems to hang after starting a first batch of downloads.

peinan commented 4 years ago

I have the exactly same problem in OSX 10.15.6, gcloud {308.0.0, 297.0.1}, gsutil {4.53, 4.51}, python 3.7.8.

waltaskew commented 3 years ago

Same in OSX 10.15.7, gcloud 303.0.0, gsutil 4.52, python 3.6.5

mobuchowski commented 3 years ago

Same in gcloud 317.0.0, python 3.7.7

rrauber commented 3 years ago

Thanks for reporting this! I was able to reproduce this bug as well and found that disabling multiprocessing helped. You can do this by setting parallel_process_count=1 in the GSUtil section of your boto config file, or by adding the following flag to your command: -o "GSUtil:parallel_process_count=1". Though this disables multiprocessing, multithreading should still be enabled, so you'll still be able to parallelize your transfers.

I'm guessing this issue is related to more general issues with multiprocessing on MacOS that PR #1107 left us vulnerable to (https://github.com/GoogleCloudPlatform/gsutil/pull/1107#issuecomment-698555319). If you're still having this issue after disabling multiprocessing please let us know!

alamothe commented 3 years ago

Thank you for providing a workaround, it works for me.

This tool is a joke though. Not a single version since 297 works out of the box without some kind of patching.

dweekly commented 3 years ago

The -o "GSUtil:parallel_process_count=1" workaround works for me on Big Sur (macOS 11.2) but it's frustrating to me that this continues to persist as an issue on the Mac platform.

danielyaa5 commented 3 years ago

Doesnt work for me, I get CommandException: Destination URL must name a directory, bucket, or bucket subdirectory for the multiple source form of the cp command. please fix your joke product

gcarr1020 commented 2 years ago

I ran into the same CommandException: Destination URL must name a directory, bucket, or bucket subdirectory for the multiple source form of the cp command.issue on OSX 12.0 Beta (M1 Mac). I fixed it by removing the -m flag. So the command was gsutil cp -r gs://<path/to/bucket_or_sub_bucket> .. Unfortunately, this removes parallelism and significantly affects performance.

nrempel commented 1 year ago

If you're here, I recommend the new gcloud storage cp commend: https://cloud.google.com/sdk/gcloud/reference/storage/cp