aws / aws-sdk-go-v2

AWS SDK for the Go programming language.
https://aws.github.io/aws-sdk-go-v2/docs/
Apache License 2.0
2.65k stars 638 forks source link

Setting endpoint and bucket does not work if they are different domains #2883

Open deitch opened 4 days ago

deitch commented 4 days ago

Acknowledgements

Describe the bug

Create a proxy or local S3-compatible server. Run it at localhost:8080. You would expect that the endpoint is not part of the name. Then try to do activities against the bucket named bucket1. The Host header in the request always includes the bucket name and the endpoint.

For example, PutObject for myfile against bucket bucket1.mydomain.com with endpoint localhost:9000 should have headers:

PUT /myfile?x-id=PutObject HTTP/1.1
Host: bucket1.mydomain.com

Yet it actually has

PUT /myfile?x-id=PutObject HTTP/1.1
Host: bucket1.mydomain.com.localhost:9000

Regression Issue

Expected Behavior

The endpoint would not be part of the Host, since that is a pointer as to where to find the fully named bucket.

Current Behavior

Includes it in the host. See the bug description

Reproduction Steps

    var (
        opts   []func(*config.LoadOptions) error // global client options
        s3opts []func(*s3.Options)               // s3 client options
    )
        s3opts = append(s3opts,
            // I tried with each of the following options, both had more or less same result
            //s3.WithEndpointResolverV2(&staticResolver{endpoint: "localhost:9000"}),
            func(o *s3.Options) {
                o.BaseEndpoint = "localhost:9000"
            },
        )
    }
    opts = append(opts, config.WithClientLogMode(aws.LogRequestWithBody|aws.LogResponse))
    opts = append(opts, config.WithRegion(region))
    opts = append(opts, config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
        "myaccesskey"
        "mysecreykey",
        "",
    )))
    cfg, err := config.LoadDefaultConfig(context.TODO(),
        opts...,
    )
    if err != nil {
        return nil, fmt.Errorf("failed to load AWS config: %v", err)
    }

    // Create a new S3 service client
    client := s3.NewFromConfig(cfg, s3opts...)

    uploader := manager.NewUploader(client)

    // Create a file to write the S3 Object contents to.
    f, err := os.Open("source file")
    if err != nil {
        return 0, fmt.Errorf("failed to read input file %q, %v", source, err)
    }
    defer f.Close()

    // Write the contents of the file to the S3 object
    _, err = uploader.Upload(context.TODO(), &s3.PutObjectInput{
        Bucket: aws.String("bucket1.mydomain.com"),
        Key:    aws.String("my file"),
        Body:   f,
    })

Possible Solution

No response

Additional Information/Context

I did try various combinations of BaseEndpoint and EndpointResolverV2 as described in this doc, to no avail.

I suspect there is some combination of which I am not aware, in which case feel free to call this a "docs error report" as opposed to a bug report.

AWS Go SDK V2 Module Versions Used

        github.com/aws/aws-sdk-go-v2 v1.32.3
        github.com/aws/aws-sdk-go-v2/config v1.28.1
        github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.35
        github.com/aws/aws-sdk-go-v2/service/s3 v1.66.2
        github.com/aws/aws-sdk-go-v2/credentials v1.17.42
        github.com/aws/aws-sdk-go v1.44.256 // indirect
        github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.6 // indirect
        github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.18 // indirect
        github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.22 // indirect
        github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.22 // indirect
        github.com/aws/aws-sdk-go-v2/internal/ini v1.8.1 // indirect
        github.com/aws/aws-sdk-go-v2/internal/v4a v1.3.22 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.0 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.4.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.18.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/sso v1.24.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/sts v1.32.3 // indirect
        github.com/aws/smithy-go v1.22.0

Compiler and Version used

go version go1.23.0

Operating System and version

linux/amd64

RanVaknin commented 4 days ago

Hi @deitch ,

virtual hosted bucket host anatomy is <bucket>.<endpoint>

For example, a bucket "foo" and us-east-1 will result in host: foo.s3.us-east-1.amazonaws.com

In your case, the bucket name is bucket1.mydomain.com and the base endpoint is localhost:9000 which will result in the host being bucket1.mydomain.com.localhost:9000 which is the correct and expected result.

Thanks, Ran~

deitch commented 3 days ago

Hi @RanVaknin thanks for jumping in so quickly; I do rather appreciate it.

For example, a bucket "foo" and us-east-1 will result in host: foo.s3.us-east-1.amazonaws.com

I had always assumed endpoint is distinct from the hostname being served. The same way you can use SNI on certs, etc. "Endpoint" = "go to this IP or FQDN to access the service", while "virtual-path bucket" = "this is the Host field I will put in the headers". They could very well be distinct.

What is the correct way to ask the sdk, "I want you to request the bucket FQDN bucket1.mydomain.com (i.e. that is the Host header), but establish the connection to localhost:9000"? They don't have to be tied together.

If this is something we do not support but would want to, I am game for opening a PR for it, if I can have some proper direction as to where. I would guess an option that says not to append the endpoint to the bucket FQDN?

RanVaknin commented 1 day ago

Hi @deitch ,

I had always assumed endpoint is distinct from the hostname being served.

I think this might be a confusion based on the meaning of the word endpoint that is being used here to mean SDK specific thing. Endpoint is where the request is being sent to. In the context of S3, the endpoint will either be formatted with the bucket name prefix the endpoint for virtual hosted buckets (<bucket>.<endpoint>) or the bucket name will be used as a suffix for path style buckets (<endpoint>/<bucket>).

What is the correct way to ask the sdk, "I want you to request the bucket FQDN bucket1.mydomain.com (i.e. that is the Host header), but establish the connection to localhost:9000"? They don't have to be tied together.

They are tied together. The SDK does not have a built in DNS resolver to know that mydomain.com actually points to 127.0.0.1 (localhost) If you need to route traffic from your custom domain to localhost then it needs to happen from outside the context of the SDK.

If this is the desired outcome:

PUT /myfile?x-id=PutObject HTTP/1.1
Host: bucket1.mydomain.com

the bucket name is bucket1 and the BaseEndpoint is mydomain.com. That would achieve the s3 virtual hosted bucket scheme of <bucket>.<endpoint>

Then to route mydomain.com to localhost:9000 you can edit your system's host file to route traffic from mydomain.com to 127.0.0.1 and then using a reverse proxy routing traffic to port 9000 of your localhost.

I might be missing the point here since this use case is new to me. If you are using an S3 clone that is running locally, then routing it to localhost:9000 should be enough to test with the SDK. If your custom domain(mydomain.com) is "live" and fronting an actual S3 bucket then routing mydomain.com should be the correct approach. I'm not sure what is the goal of sending requests to mydomain.com but actually routing traffic to localhost.

Thanks, Ran~

deitch commented 15 hours ago

If you are using an S3 clone that is running locally, then routing it to localhost:9000 should be enough to test with the SDK. If your custom domain(mydomain.com) is "live" and fronting an actual S3 bucket then routing mydomain.com should be the correct approach. I'm not sure what is the goal of sending requests to mydomain.com but actually routing traffic to localhost.

These are both cases I am working with: a local clone (primarily for testing, but not always) and a transparent proxy.

I think this might be a confusion based on the meaning of the word endpoint that is being used here to mean SDK specific thing

This, I believe, is the heart of it. I think you are saying, from the SDK's perspective, "endpoint" means two things:

  1. Routing: The FQDN that the SDK will use to find the server to which to connect (Layer 3/4)
  2. Hostname: The value to place in the Host header, i.e. <bucketname>.<endpoint> (Layer 7)

I can get why the endpoint might mean both, but also why we might want them to be optionally separable.

There is a direct analogy in pkg net/http. On the one hand, if I do http.Get("http://example.com/"), it will use example.com as both the FQDN to use to resolve for Layer 3 and the value to place in the Host header. However, if I want to split the two (which is common), I can use an http.Client, set the Transport property to http.Transport, which has the Dial property:

Dial func(network, addr [string](https://pkg.go.dev/builtin#string)) ([net](https://pkg.go.dev/net).[Conn](https://pkg.go.dev/net#Conn), [error](https://pkg.go.dev/builtin#error))

The resolution of "here is an FQDN" to "here is a net.Conn which the higher-level http.Client can use to create the http connection, sending whatever headers it wants.

As I think about this, if your position is that this may be a valid use case, but should be handled at the http.Client level, like any other case, and that "S3 endpoint" does not mean "control connection endpoint", that would make sense, too. All that would be needed is some clear direction/docs as to how to do that.

Does this explanation help?

RanVaknin commented 4 hours ago

Hi @deitch,

Thanks for the additional info.

There is a direct analogy in pkg net/http.

That is because the Go SDK's http client is the Golang standard library http client. The SDK only builds the request and then hands it to the standard library to handle the actual http request.

You can override the SDK's http client to use your desired custom Transport layer with your own implementation of Dial if that is what you are after.

Let me know if this is the piece of info you are after.

Thanks, Ran~

deitch commented 3 hours ago

Ah, that's it. So would the following be correct?

-----BEGIN----- The aws-sdk-v2 endpoint parameter defines the endpoint used for accessing the bucket. This endpoint is used both for resolving the server hostname and port, as well as the Host header in the http connection. If you use virtual-host-style buckets, then the Host header will have the bucket name prepended to the endpoint.

If you wish to override low-level connection, for example to change the timeout or connect to a different server and port, you can do so by changing the http.Client used. The Host header will continue to be the endpoint - for path-style - or bucket.endpoint - for virtual-host style - but the network connection will be constructed via the http.Client that you provide. -----END-----