GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
869 stars 332 forks source link

gsutil crashes when listing bucket contents when virtual addressing style fails #1001

Open pappasilenus opened 4 years ago

pappasilenus commented 4 years ago

What happened?:

I configured gsutil to work with an on-premises S3 data source using a boto3 config (see below). The on-premises data source uses path addressing and not virtual addressing. The configuration I detail below should allow gsutil to correctly access GCS buckets using the gs:// prefix with an OAuth 2 User Account and S3 on-premises buckets, such as provided by minio front-ending a data fabric, using the s3:// prefix. It fails if virtual addressing is not supported, which is a rare and difficult configuration for on-premises S3 data stores.

I can list buckets using this command:

gsutil ls  s3://

However, any attempt to list the contents of a bucket, using the -r flag, results in a gsutil crash, shown below:

$  gsutil ls -r s3://master.train/

Traceback (most recent call last):
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gsutil.py", line 124, in RunMain
    sys.exit(gslib.__main__.main())
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 433, in main
    user_project=user_project)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 628, in _RunNamedCommandAndHandleExceptions
    user_project=user_project)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/commands/ls.py", line 641, in RunCommand
    listing_helper.ExpandUrlAndPrint(storage_url))
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/utils/ls_helper.py", line 369, in ExpandUrlAndPrint
    print_initial_newline=False)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/utils/ls_helper.py", line 443, in _RecurseExpandUrlAndPrint
    bucket_listing_fields=self.bucket_listing_fields):
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 498, in IterAll
    expand_top_level_buckets=expand_top_level_buckets):
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 174, in __iter__
    fields=bucket_listing_fields):
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 434, in ListObjects
    for key in objects_iter:
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/bucketlistresultset.py", line 34, in bucket_lister
    encoding_type=encoding_type)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/bucket.py", line 477, in get_all_keys
    '', headers, **params)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/bucket.py", line 403, in _get_all
    query_args=query_args)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/connection.py", line 684, in make_request
    retry_handler=retry_handler
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/connection.py", line 1074, in make_request
    retry_handler=retry_handler)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/connection.py", line 1033, in _mexe
    raise ex
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

What you expected to happen?: I expected it to list the contents of the bucket.

Properly configured aws-cli or mc works correctly and falls back to path configuration when virtual configuration fails.

How to reproduce it (as minimally and precisely as possible)?:

The first two steps are documented in the gcloud docs.

  1. Install gsutil standalone or disable gcloud credential-passing to gsutil
    gcloud config set pass_credentials_to_gsutil false
  2. Set up gsutil configuration
    gsutil config
  3. This will create a ~/.boto file. In that file, around line 42, there are settings for adding HMAC credentials for s3:// urls. Add the authentication tokens for your on-premises S3 data store and configure s3_host and s3_port. Here are example settings for a typical minio setup:
    # To add HMAC aws credentials for "s3://" URIs, edit and uncomment the
    # following two lines:
    aws_access_key_id = minio
    aws_secret_access_key = minio123
    # The ability to specify an alternate storage host and port
    # is primarily for cloud storage service developers.
    # Setting a non-default gs_host only works if prefer_api=xml.
    s3_host = minio
    s3_port = 9000
    # In some cases, (e.g. VPC requests) the "host" HTTP header should
    # be different than the host used in the request URL.
    #s3_host_header = <alternate storage host header>
  4. If you have not added SSL certificates, you'll need to add the following in the [boto3] section
    is_secure = False

If configuration worked properly, if you do a

gsutil ls gs://

you should see a list of your buckets in gcs.

Doing a

gsutil ls s3://

Will show a list of the buckets in your minio server.

The crash occurs when attempting to list the contents of any of those buckets.

Anything else we need to know?:

Running the command with the -DD flag will clearly indicate it is trying to access a path-addressed bucket.

Environment?:

dilipped commented 4 years ago

Can you please post the output of gsutil version -l? Also, are you getting the same error when not using s3_host and s3_port options in your boto cfg? What's the region for your bucket s3://master.train/