I configured gsutil to work with an on-premises S3 data source using a boto3 config (see below). The on-premises data source uses path addressing and not virtual addressing. The configuration I detail below should allow gsutil to correctly access GCS buckets using the gs:// prefix with an OAuth 2 User Account and S3 on-premises buckets, such as provided by minio front-ending a data fabric, using the s3:// prefix. It fails if virtual addressing is not supported, which is a rare and difficult configuration for on-premises S3 data stores.
I can list buckets using this command:
gsutil ls s3://
However, any attempt to list the contents of a bucket, using the -r flag, results in a gsutil crash, shown below:
$ gsutil ls -r s3://master.train/
Traceback (most recent call last):
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
gsutil.RunMain()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gsutil.py", line 124, in RunMain
sys.exit(gslib.__main__.main())
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 433, in main
user_project=user_project)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 628, in _RunNamedCommandAndHandleExceptions
user_project=user_project)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
return_code = command_inst.RunCommand()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/commands/ls.py", line 641, in RunCommand
listing_helper.ExpandUrlAndPrint(storage_url))
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/utils/ls_helper.py", line 369, in ExpandUrlAndPrint
print_initial_newline=False)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/utils/ls_helper.py", line 443, in _RecurseExpandUrlAndPrint
bucket_listing_fields=self.bucket_listing_fields):
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 498, in IterAll
expand_top_level_buckets=expand_top_level_buckets):
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 174, in __iter__
fields=bucket_listing_fields):
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 434, in ListObjects
for key in objects_iter:
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/bucketlistresultset.py", line 34, in bucket_lister
encoding_type=encoding_type)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/bucket.py", line 477, in get_all_keys
'', headers, **params)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/bucket.py", line 403, in _get_all
query_args=query_args)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/connection.py", line 684, in make_request
retry_handler=retry_handler
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/connection.py", line 1074, in make_request
retry_handler=retry_handler)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/connection.py", line 1033, in _mexe
raise ex
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
What you expected to happen?:
I expected it to list the contents of the bucket.
Properly configured aws-cli or mc works correctly and falls back to path configuration when virtual configuration fails.
How to reproduce it (as minimally and precisely as possible)?:
Install gsutil standalone or disable gcloud credential-passing to gsutil
gcloud config set pass_credentials_to_gsutil false
Set up gsutil configuration
gsutil config
This will create a ~/.boto file. In that file, around line 42, there are settings for adding HMAC credentials for s3:// urls. Add the authentication tokens for your on-premises S3 data store and configure s3_host and s3_port. Here are example settings for a typical minio setup:
# To add HMAC aws credentials for "s3://" URIs, edit and uncomment the
# following two lines:
aws_access_key_id = minio
aws_secret_access_key = minio123
# The ability to specify an alternate storage host and port
# is primarily for cloud storage service developers.
# Setting a non-default gs_host only works if prefer_api=xml.
s3_host = minio
s3_port = 9000
# In some cases, (e.g. VPC requests) the "host" HTTP header should
# be different than the host used in the request URL.
#s3_host_header = <alternate storage host header>
If you have not added SSL certificates, you'll need to add the following in the [boto3] section
is_secure = False
If configuration worked properly, if you do a
gsutil ls gs://
you should see a list of your buckets in gcs.
Doing a
gsutil ls s3://
Will show a list of the buckets in your minio server.
The crash occurs when attempting to list the contents of any of those buckets.
Anything else we need to know?:
Running the command with the -DD flag will clearly indicate it is trying to access a path-addressed bucket.
Can you please post the output of gsutil version -l?
Also, are you getting the same error when not using s3_host and s3_port options in your boto cfg?
What's the region for your bucket s3://master.train/
What happened?:
I configured gsutil to work with an on-premises S3 data source using a boto3 config (see below). The on-premises data source uses path addressing and not virtual addressing. The configuration I detail below should allow gsutil to correctly access GCS buckets using the gs:// prefix with an OAuth 2 User Account and S3 on-premises buckets, such as provided by minio front-ending a data fabric, using the s3:// prefix. It fails if virtual addressing is not supported, which is a rare and difficult configuration for on-premises S3 data stores.
I can list buckets using this command:
However, any attempt to list the contents of a bucket, using the
-r
flag, results in a gsutil crash, shown below:What you expected to happen?: I expected it to list the contents of the bucket.
Properly configured aws-cli or mc works correctly and falls back to path configuration when virtual configuration fails.
How to reproduce it (as minimally and precisely as possible)?:
The first two steps are documented in the gcloud docs.
~/.boto
file. In that file, around line 42, there are settings for adding HMAC credentials for s3:// urls. Add the authentication tokens for your on-premises S3 data store and configure s3_host and s3_port. Here are example settings for a typical minio setup:[boto3]
sectionIf configuration worked properly, if you do a
you should see a list of your buckets in gcs.
Doing a
Will show a list of the buckets in your minio server.
The crash occurs when attempting to list the contents of any of those buckets.
Anything else we need to know?:
Running the command with the -DD flag will clearly indicate it is trying to access a path-addressed bucket.
Environment?: