GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
875 stars 335 forks source link

gsutil breaks after updating to SDK 298 on OS X #1052

Closed codefrau closed 3 years ago

codefrau commented 4 years ago

SDK 297 was fine

+ gcloud version
Google Cloud SDK 298.0.0
beta 2020.06.19
bq 2.0.58
core 2020.06.19
gsutil 4.51
kubectl 2020.05.01

+ gsutil -m rsync -r -c -x '^\.|.*\.js\.map$' . gs://croquet.io/

WARNING: You have requested checksumming but your crcmod installation isn't
using the module's C extension, so checksumming will run very slowly. For help
installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
module 'sys' has no attribute 'maxint'
CommandException: 1 files/objects could not be copied/removed.
+ echo 'Fixing metadata...'
Fixing metadata...
+ gsutil -m -q setmeta -h Content-Type:text/html -h 'Cache-Control:public, max-age=60' 'gs://croquet.io/**.html'
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2348, in run
    cls = copy.copy(class_map[caller_id])
  File "<string>", line 2, in __getitem__
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod
    self._connect()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 740, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 487, in Client
    c = SocketClient(address)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused

This is on macOS Catalina 10.15.5:

$ gsutil version -l
gsutil version: 4.51
checksum: a4c57d9b2479f11efe1b0ffb6470c0c5 (OK)
boto version: 2.49.0
python version: 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
OS: Darwin 19.5.0
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): /Users/vanessa/.boto
gsutil path: /usr/local/google-cloud-sdk/bin/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False

The same command works fine again after reverting to 297 that I had installed previously.

dilipped commented 4 years ago

I tried the rsync command you have mentioned above and it's working for me. Are you able to reproduce the issue?

adambar commented 4 years ago

I also have an issue with gcloud 298's gsutil on OS X. My error occurs when I run a cp operation and works fine again after downgrading to 297.

I've anonymized my output but it looks like:

/Users/secretuser/google-cloud-sdk/bin/gsutil -q cp -n testfile gs://bucket/hidden/testfile

File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gsutil.py", line 123, in RunMain
    sys.exit(gslib.__main__.main())
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 429, in main
    return _RunNamedCommandAndHandleExceptions(
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 767, in _RunNamedCommandAndHandleExceptions
    _HandleUnknownFailure(e)
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 625, in _RunNamedCommandAndHandleExceptions
    return command_runner.RunNamedCommand(command_name,
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1205, in RunCommand
    self.Apply(_CopyFuncWrapper,
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1485, in Apply
    caller_id = self._SetUpPerCallerState()
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1360, in _SetUpPerCallerState
    class_map[caller_id] = cls
  File "<string>", line 2, in __setitem__
  File "/Users/secretuser/.pyenv/versions/3.8.3/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/managers.py", line 850, in _callmethod
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/secretuser/.pyenv/versions/3.8.3/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/managers.py", line 243, in serve_client
    request = recv()
  File "/Users/secretuser/.pyenv/versions/3.8.3/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 30, in <module>
    from gslib.command import Command
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command.py", line 50, in <module>
    from gslib.cloud_api_delegator import CloudApiDelegator
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 26, in <module>
    from gslib.cs_api_map import ApiMapConstants
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/cs_api_map.py", line 23, in <module>
    from gslib.gcs_json_api import GcsJsonApi
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 72, in <module>
    from gslib.third_party.storage_apitools import storage_v1_client as apitools_client
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 26, in <module>
    class StorageV1(base_api.BaseApiClient):
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 38, in StorageV1
    _USER_AGENT += gslib.USER_AGENT
AttributeError: module 'gslib' has no attribute 'USER_AGENT'
codefrau commented 4 years ago

@dilipped yes I can reproduce. As soon as I update, it breaks:


$ gsutil -m rsync -r -c -x '^\.|.*\.js\.map$' . gs://croquet.io/
Building synchronization state...
Starting synchronization...

$ gcloud version
Google Cloud SDK 297.0.0
beta 2019.05.17
bq 2.0.58
core 2020.06.12
gsutil 4.51
kubectl 2020.05.01
Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

$ sudo gcloud components update 

Your current Cloud SDK version is: 297.0.0
You will be upgraded to version: 298.0.0

┌─────────────────────────────────────────────────────────────────────────────┐
│                      These components will be updated.                      │
├─────────────────────────────────────────────────────┬────────────┬──────────┤
│                         Name                        │  Version   │   Size   │
├─────────────────────────────────────────────────────┼────────────┼──────────┤
│ BigQuery Command Line Tool (Platform Specific)      │     2.0.58 │  < 1 MiB │
│ Cloud SDK Core Libraries                            │ 2020.06.19 │ 15.0 MiB │
│ Cloud SDK Core Libraries (Platform Specific)        │ 2020.06.19 │  < 1 MiB │
│ Cloud Storage Command Line Tool (Platform Specific) │       4.51 │  < 1 MiB │
│ gcloud cli dependencies                             │ 2020.06.19 │  3.4 MiB │
└─────────────────────────────────────────────────────┴────────────┴──────────┘

...

Update done!

To revert your SDK to the previously installed version, you may run:
  $ gcloud components update --version 297.0.0

$ gsutil -m rsync -r -c -x '^\.|.*\.js\.map$' . gs://croquet.io/

WARNING: You have requested checksumming but your crcmod installation isn't
using the module's C extension, so checksumming will run very slowly. For help
installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
module 'sys' has no attribute 'maxint'
CommandException: 1 files/objects could not be copied/removed.
$ 
chollinger93 commented 4 years ago

This AttributeError: module 'gslib' has no attribute 'USER_AGENT' only happens when using the -m flag. It's caused by a missing attribute (USER_AGENT in gslib/__init__.py).

I was able to fix it by manually merging this commit: https://github.com/GoogleCloudPlatform/gsutil/commit/f8f00d01e8fb10d1d31cb15c4050536d1e900401

Which simply adds the USER_AGENT variable back. Not sure why that is not in the official binary/archive.

Another issue is macOS and Python 3.8 specific, which of course, is a wonderful combination (I'm not bitter, you are!): https://bugs.python.org/issue33725 and https://github.com/GoogleCloudPlatform/gsutil/issues/961 give some hints. This can be resolved by upgrading Python or by just glueing it together and hoping for the best: https://github.com/python/cpython/pull/13603/commits/bc366964d2dabcf14427604a2322fa6644023132

Since I still got TypeError: cannot pickle '_io.TextIOWrapper', this one is really funny.

It hits the multiprocessing library reduction.py->dump() method, where it passes both a gsutil.cp process and a dict that starts with {'log_to_stderr': False, 'authkey'.... Apparentlygsutil tries to start a dict as a process somehow.

I hence "fixed" this by adding:

def dump(obj, file, protocol=None):
    '''Replacement for pickle.dump() using ForkingPickler.'''
    if type(obj) == dict:
        return
    ForkingPickler(file, protocol).dump(obj)

in multiprocessing.reduction.dump(), which is more a joke than a fix. But it does tell me that somehow, this funky dict is generated somewhere. I'll just downgrade, but maybe one of the Google folks can look at that. Looks like a dict of what I assume are environment variables somehow make their way into the process pool.

dilipped commented 4 years ago

@otter-in-a-suit Thanks for the information!

Regarding AttributeError: module 'gslib' has no attribute 'USER_AGENT' this is a known bug that we fixed after gsutil v4.51 was released in https://github.com/GoogleCloudPlatform/gsutil/commit/f8f00d01e8fb10d1d31cb15c4050536d1e900401 . The fix has been merged and it will be made available in the gcloud sdk binary in the next gsutil release.

@codefrau For the module 'sys' has no attribute 'maxint' error, it is getting raised from the crcmod-osx module. This looks like a bug in https://github.com/gsutil-mirrors/crcmod-osx where it is calling sys.maxint and maxint doesn't exist in python3. As a quick fix, you can try installing the crcmod by following the steps here https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod#macos

dweekly commented 4 years ago

Still broken in Cloud SDK 300.0.0 on macOS 10.15.5 (Python 3.8.3).

sheurich commented 4 years ago

And 301.0.0 macOS 10.16 beta (Python 3.8.3).

dilipped commented 4 years ago

Unfortunately, the fix for AttributeError: module 'gslib' has no attribute 'USER_AGENT' was not rolled out in the 301.0.0 release. It will be part of 302.0.0. Sorry for the delay.

hartbeatnt commented 4 years ago

any estimate on when 302.0.0 will be released?

dilipped commented 4 years ago

21st July, if nothing blocks the release

hartbeatnt commented 4 years ago

ok, thanks. For anyone else running into the AttributeError: module 'gslib' has no attribute 'USER_AGENT' issue, rolling back to a previous version will fix the problem until 302.0.0 is released: gcloud components update --version 297.0.1

You might not need to back that far but I can verify that 291.0.1 works (at least for me)

ttwd80 commented 4 years ago

303.0.0 works for me. We can close this.

codefrau commented 4 years ago

Nope. In 303.0.0 I still get module 'sys' has no attribute 'maxint'. And the AttributeError: 'ForkAwareLocal' object has no attribute 'connection' is still there, too. Both were in my very first report.

ttwd80 commented 4 years ago

@codefrau what's the python version? Mine is Python 3.8.3.

dilipped commented 4 years ago

I just wanted to point out that 303.0.0 only fixes theAttributeError: module 'gslib' has no attribute 'USER_AGENT' issue. The other two issues have not been fixed yet. For the maxint issue, the work around would be to install crcmod library directly instead of relying on the one shipped with gsutil for macOS. Instructions can be found here https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod.

LokeshNanda commented 4 years ago

In 303.0.0 now getting TypeError: cannot pickle '_io.TextIOWrapper' objec

giovannibonetti commented 4 years ago

In 303.0.0 now getting TypeError: cannot pickle '_io.TextIOWrapper' objec

Please see https://github.com/GoogleCloudPlatform/gsutil/issues/961#issuecomment-663565856. It solved the problem for me.

Amzd commented 4 years ago

I have the same maxint error but cannot install crcmod with the instructions at https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod.

Could not find a version that satisfies the requirement crcmod (from versions: )

Update:

Fixed the above error by installing pip2 instead of pip3.

But even with crcmod installed the way that the instructions say it still gives the same maxint error.

The MacOS description also states:

If for some reason the pre-compiled version is not being detected, please let the Google Cloud Storage team know

So hereby.

dilipped commented 4 years ago

@Amzd which python version are you using? You can check that by doing gsutil ver -l. Make sure you are installing crcmod for the correct python version. If you have multiple Python binaries available on your system, it is possible that gcloud is running on one python version but the crcmod is getting installed for a different python version.

You can check your python path by running gcloud info. Then you can run <your python path> -m pip install crcmod to install crcmod for that particular python version.

Amzd commented 4 years ago

Ah okay, gsutil is using python3 but when I try to install crcmod with python3 I get the error:

Could not find a version that satisfies the requirement crcmod (from versions: )
No matching distribution found for crcmod
codefrau commented 4 years ago

Tried 314.0 today, still broken, I guess #1107 is not deployed yet?

tmc commented 4 years ago

This is still an issue

dilipped commented 4 years ago

1107 Is not deployed yet. We are working on the release and it should be out by next week or the week after.

The PR does not address the crcmod issue. For crcmod related error, installing the library directly should resolve the issue - https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod

maccman commented 4 years ago

Can you update us when this is fixed? It's still an issue for me (after updating gcloud components).

alvis commented 3 years ago

I can confirm that manually installing crcmod is valid workaround.

The only tricky thing is that you have to identify which python gsutil is using, and hence the corresponding pip. If you have configured the CLOUDSDK_PYTHON environment variable, the path is easy to be identified. If not, check the python version via gsutil version -l. 😉

clintron commented 3 years ago

Building on @alvis 's comment, you'll need to use Python 3.7 for now. It sounds like the issue has been fixed, but I don't know if the patch has made it into the current release yet: https://bugs.python.org/issue33725

max-sixty commented 3 years ago

I think this is solved now, could the maintainers confirm?

codefrau commented 3 years ago

The version in SDK 323.0.0 appears to work fine. I don't remember if I had to build crcmod or not, but it works for me.

The only annoyance is this warning which is printed on each invocation:

If you experience problems with multiprocessing on MacOS, they might be related to https://bugs.python.org/issue33725. You can disable multiprocessing by editing your .boto config or by adding the following flag to your command: -o "GSUtil:parallel_process_count=1". Note that multithreading is still available even if you disable multiprocessing.

I edited my .boto to silence it. I assume it will also go away with a newer Python version (I'm using the system default on Catalina, 3.6.5).

So I as the original reporter of this issue consider it fixed (yay!) but I'll leave it to the maintainers to decide if it's okay to close.

martindufort commented 3 years ago

Getting this error with this Cloud SDK version:

Google Cloud SDK 325.0.0
beta 2021.01.22
bq 2.0.64
cloud-datastore-emulator 2.1.0
core 2021.01.22
gcloud 
gsutil 4.58

when trying to synchronize.

Building synchronization state...
Starting synchronization...
module 'sys' has no attribute 'maxint'

: python --version                                                                                                                    
Python 3.7.1
dilipped commented 3 years ago

I will close this based on https://github.com/GoogleCloudPlatform/gsutil/issues/1123.

@martindufort The maxint seems to be an issue because of crcmod which is a separate issue and is not related to the multiprocessing issue discussed here. Please install crcmod directly to fix it. You can refer to https://github.com/GoogleCloudPlatform/gsutil/issues/1123 to learn more about the crcmod issue. The compiled crcmod library shipped with gsutil is broken for Python3 and hence we recommend installing crcmod directly. Feel free to file a separate issue if that does not work for you.

Thanks!

confiq commented 3 years ago

@martindufort , because I had the same problem... you'll need to update crcmod as stated before. In my machine™️ upgrading globally helped pip3 install -U crcmod. Might help for the next soul that arrives here from google...

Ali-Parandeh commented 2 years ago

I still have issues with this. Does anyone know how to fix it? Gsutil keeps hanging for me when I use the -m flag.

berk94 commented 1 year ago

I fixed the module 'sys' has no attribute 'maxint' error with the following steps: 1) Run gcloud info 2) Note down the Python Location as <python_location> 3) Run <python_location> -m pip install crcmod

I'm using Homebrew Python, which is currently at v 3.10.7, but when I ran gcloud info, I saw that the Python version was 3.9.14 (different than brew's current Python version). Directly running pip3 install -U crcmod did not work as it was installing crcmod for Python3.10, which isn't the Python used by gsutil. Hope this helps others who experience the same problem!