aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.59k stars 4.14k forks source link

Absolute DNS support for endpoints #9023

Open fyannk opened 2 years ago

fyannk commented 2 years ago

Describe the bug

There is a problem when using absolute FQDN DNS, as specified in RFC 3986 aws/aws-cli#3.2.2:

The rightmost domain
   label of a fully qualified domain name in DNS may be followed by a
   single "." and should be if it is necessary to distinguish between
   the complete domain name and some local domain.

Seen with aws-cli to contact S3-compatible storage.

This was seen when trying to use endpoint "http://minio.test.local.:9000" instead of "http://minio.test.local:9000"

Expected Behavior

Can contact endpoints with absolute FQDN.

Example of a list-bucket for S3 with endpoint "http://minio.test.local.:9000" :

GET / HTTP/1.1
Host: minio.test.local:9000
Accept-Encoding: identity
User-Agent: aws-cli/2.8.0 Python/3.9.11 Linux/5.10.57 docker/x86_64.amzn.2 prompt/off command/s3api.list-buckets
X-Amz-Date: 20221005T091409Z
X-Amz-Content-SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Authorization: AWS4-HMAC-SHA256 Credential=IvUHFXFrif7yISPa/20221005/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=a41be7a1dfb7144ca4b44d3258f07e2cdf24a2f2938b70c00b92bc55a05fc88a

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 362
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Request-Id: 171B216F7F100704
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Date: Wed, 05 Oct 2022 09:14:10 GMT

<?xml version="1.0" encoding="UTF-8"?>
<ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>02d6176db174dc93cb1b899f7c6078f08654445fe8cf1b6ce98d8855f66bdbf4</ID><DisplayName>minio</DisplayName></Owner><Buckets><Bucket><Name>test</Name><CreationDate>2022-10-05T09:01:56.560Z</CreationDate></Bucket></Buckets></ListAllMyBucketsResult>

Current Behavior

Server is rejecting the call (example of a list-bucket for S3 with endpoint "http://minio.test.local.:9000") :

GET / HTTP/1.1
Host: minio.test.local:9000
Accept-Encoding: identity
User-Agent: aws-cli/2.8.0 Python/3.9.11 Linux/5.10.57 docker/x86_64.amzn.2 prompt/off command/s3api.list-buckets
X-Amz-Date: 20221005T094243Z
X-Amz-Content-SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Authorization: AWS4-HMAC-SHA256 Credential=IvUHFXFrif7yISPa/20221005/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=6d5dc75fee10b281c243ab6b9fd94a4ebb8a15dcfe6d913bebf67f50a05b0688

HTTP/1.1 403 Forbidden
Accept-Ranges: bytes
Content-Length: 334
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Request-Id: 171B22FE73EC0E22
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Date: Wed, 05 Oct 2022 09:42:43 GMT

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><Resource>/</Resource><RequestId>171B22FE73EC0E22</RequestId><HostId>2e5ec549-1cf7-4398-b581-c1e5a3aefb44</HostId></Error>

Reproduction Steps

Use any absolute FQDN for endpoint

Example:

Not working:

aws --debug s3api list-buckets --endpoint-url http://minio.test.local.:9000

Working:

aws --debug s3api list-buckets --endpoint-url http://minio.test.local:9000

Possible Solution

For my use case (dunno if there are others), I believe the problem comes from function _host_from_url in botocore/auth.py

The signature is made with absolute FQDN, bit the underlying http respect RFCs and remove the ending dot, hence signature mismatched sent informations.

If we rewrite like that:

def _host_from_url(url):
    # Given URL, derive value for host header. Ensure that value:
    # 1) is lowercase
    # 2) excludes port, if it was the default port
    # 3) excludes userinfo
    url_parts = urlsplit(url)
    host = url_parts.hostname  # urlsplit's hostname is always lowercase
    default_ports = {
        'http': 80,
        'https': 443
    }
    if host[-1] == ".":
        host = host[:-1]
    if url_parts.port is not None:
        if url_parts.port != default_ports.get(url_parts.scheme):
            host = '%s:%d' % (host, url_parts.port)
    return host

My test call succeeds.

Additional Information/Context

Imho this is important especially in Kubernetes like environments.

To access resources outside of a cluster through DNS, a best practice is to use absolute FQDN to avoid many unnecessary DNS requests. Even if it's a really fast protocol, the best call is the one you don't make from a performance point of view.

Example of a basic resolv.conf file of a Pod inside a test cluster for example:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local test.cluster.internal openstacklocal novalocal
options ndots:5

This means that, to access for example "curl https://testbucket.s3.us-west-1.amazonaws.com/", underlying operating system will try to resolve (in order):

Obviously only the last ones of the 14 calls will succeed.

As said, DNS is fast and (mainly) in UDP, but still, on my test platform where i'm alone, I spent 62ms just for that, and on heavily loaded and mutualized k8s clusters, we see some contention on DNS subsystems.

Using absolute FQDN would mean that we'll do only 2 DNS queries:

CLI version used

2.8.0

Environment details (OS name and version, etc.)

Fedora Core 35

tim-finnigan commented 2 years ago

Hi @fyannk thanks for reaching out. Could you tell us more about your use case and how this would help? Is it just a slight performance increase or something beyond that? It seems like there is some debate about whether the trailing dot on domain names should be supported (for example in this curl issue: https://github.com/curl/curl/issues/8290).

Edit: I'm going to convert this to a feature request for FQDN support.

github-actions[bot] commented 2 years ago

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

tim-finnigan commented 2 years ago

Since endpoint usage affects other SDKs, I'm going to transfer this feature request to our cross-SDK repository. This related issue was opened recently in one of our other repositories: https://github.com/boto/boto3/issues/3492. It describes a different use case but I think addresses the same underlying ask as here.

I can't guarantee if or when this feature request would be implemented, but while it's pending further review we recommend that others interested in this add a 👍 to the original post here and to leave a comment if you have any additional feedback to share. Thanks!

senbax-admin commented 2 years ago

I do not consider this as a feature request, i consider it as a bug. We are providing valid endpoint URLs but boto3 does something wrong when calculating the signature. This issue produces a lot of unneeded DNS Traffic for many of our applications using sqs and s3.

fyannk commented 2 years ago

agreed with @senbax

imho this is a bug, as there is a problem between boto3 (that is encoding headers) and sublayers that make the actual call.

boto3 isn't removing trailing dot, sublayers are.

So boto3 must remove the trailing dot before encoding, or modify librairies to call remote with trailing dot (but that will provide many issues, mainly on vhosts, and not really RFC / all browsers / wget / curl / ... compliant)

benedikt-bartscher commented 2 years ago

One workaround (at least for SQS) is to use an VPC Interface Endpoint which allows access to SQS via another DNS Name (for example vpce-FFFFFF.sqs.eu-central-1.vpce.amazonaws.com) which has 4 dots and thus isnt resolved locally with ndots:5.

But it would be better if users could provide endpoint_urls with absolute DNS or create a feature (toggle) which automaticly adds a dot at the end of the domain on every request.