Open deitch opened 4 days ago
Hi @deitch ,
virtual hosted bucket host anatomy is <bucket>.<endpoint>
For example, a bucket "foo" and us-east-1 will result in host: foo.s3.us-east-1.amazonaws.com
In your case, the bucket name is bucket1.mydomain.com
and the base endpoint is localhost:9000
which will result in the host being bucket1.mydomain.com.localhost:9000
which is the correct and expected result.
Thanks, Ran~
Hi @RanVaknin thanks for jumping in so quickly; I do rather appreciate it.
For example, a bucket "foo" and us-east-1 will result in host: foo.s3.us-east-1.amazonaws.com
I had always assumed endpoint is distinct from the hostname being served. The same way you can use SNI on certs, etc. "Endpoint" = "go to this IP or FQDN to access the service", while "virtual-path bucket" = "this is the Host field I will put in the headers". They could very well be distinct.
What is the correct way to ask the sdk, "I want you to request the bucket FQDN bucket1.mydomain.com (i.e. that is the Host header), but establish the connection to localhost:9000"? They don't have to be tied together.
If this is something we do not support but would want to, I am game for opening a PR for it, if I can have some proper direction as to where. I would guess an option that says not to append the endpoint to the bucket FQDN?
Hi @deitch ,
I had always assumed endpoint is distinct from the hostname being served.
I think this might be a confusion based on the meaning of the word endpoint that is being used here to mean SDK specific thing. Endpoint is where the request is being sent to. In the context of S3, the endpoint will either be formatted with the bucket name prefix the endpoint for virtual hosted buckets (<bucket>.<endpoint>
) or the bucket name will be used as a suffix for path style buckets (<endpoint>/<bucket>
).
What is the correct way to ask the sdk, "I want you to request the bucket FQDN bucket1.mydomain.com (i.e. that is the Host header), but establish the connection to localhost:9000"? They don't have to be tied together.
They are tied together. The SDK does not have a built in DNS resolver to know that mydomain.com
actually points to 127.0.0.1
(localhost)
If you need to route traffic from your custom domain to localhost then it needs to happen from outside the context of the SDK.
If this is the desired outcome:
PUT /myfile?x-id=PutObject HTTP/1.1
Host: bucket1.mydomain.com
the bucket name is bucket1
and the BaseEndpoint is mydomain.com
. That would achieve the s3 virtual hosted bucket scheme of <bucket>.<endpoint>
Then to route mydomain.com to localhost:9000 you can edit your system's host file to route traffic from mydomain.com to 127.0.0.1 and then using a reverse proxy routing traffic to port 9000 of your localhost.
I might be missing the point here since this use case is new to me. If you are using an S3 clone that is running locally, then routing it to localhost:9000 should be enough to test with the SDK. If your custom domain(mydomain.com) is "live" and fronting an actual S3 bucket then routing mydomain.com should be the correct approach. I'm not sure what is the goal of sending requests to mydomain.com but actually routing traffic to localhost.
Thanks, Ran~
If you are using an S3 clone that is running locally, then routing it to localhost:9000 should be enough to test with the SDK. If your custom domain(mydomain.com) is "live" and fronting an actual S3 bucket then routing mydomain.com should be the correct approach. I'm not sure what is the goal of sending requests to mydomain.com but actually routing traffic to localhost.
These are both cases I am working with: a local clone (primarily for testing, but not always) and a transparent proxy.
I think this might be a confusion based on the meaning of the word endpoint that is being used here to mean SDK specific thing
This, I believe, is the heart of it. I think you are saying, from the SDK's perspective, "endpoint" means two things:
Host
header, i.e. <bucketname>.<endpoint>
(Layer 7)I can get why the endpoint might mean both, but also why we might want them to be optionally separable.
There is a direct analogy in pkg net/http. On the one hand, if I do http.Get("http://example.com/")
, it will use example.com
as both the FQDN to use to resolve for Layer 3 and the value to place in the Host
header. However, if I want to split the two (which is common), I can use an http.Client, set the Transport
property to http.Transport, which has the Dial
property:
Dial func(network, addr [string](https://pkg.go.dev/builtin#string)) ([net](https://pkg.go.dev/net).[Conn](https://pkg.go.dev/net#Conn), [error](https://pkg.go.dev/builtin#error))
The resolution of "here is an FQDN" to "here is a net.Conn which the higher-level http.Client can use to create the http connection, sending whatever headers it wants.
As I think about this, if your position is that this may be a valid use case, but should be handled at the http.Client
level, like any other case, and that "S3 endpoint" does not mean "control connection endpoint", that would make sense, too. All that would be needed is some clear direction/docs as to how to do that.
Does this explanation help?
Hi @deitch,
Thanks for the additional info.
There is a direct analogy in pkg net/http.
That is because the Go SDK's http client is the Golang standard library http client. The SDK only builds the request and then hands it to the standard library to handle the actual http request.
You can override the SDK's http client to use your desired custom Transport layer with your own implementation of Dial
if that is what you are after.
Let me know if this is the piece of info you are after.
Thanks, Ran~
Ah, that's it. So would the following be correct?
-----BEGIN----- The aws-sdk-v2 endpoint parameter defines the endpoint used for accessing the bucket. This endpoint is used both for resolving the server hostname and port, as well as the Host header in the http connection. If you use virtual-host-style buckets, then the Host header will have the bucket name prepended to the endpoint.
If you wish to override low-level connection, for example to change the timeout or connect to a different server and port, you can do so by changing the http.Client
used. The Host
header will continue to be the endpoint - for path-style - or bucket.endpoint - for virtual-host style - but the network connection will be constructed via the http.Client
that you provide.
-----END-----
Acknowledgements
go get -u github.com/aws/aws-sdk-go-v2/...
)Describe the bug
Create a proxy or local S3-compatible server. Run it at
localhost:8080
. You would expect that the endpoint is not part of the name. Then try to do activities against the bucket namedbucket1
. TheHost
header in the request always includes the bucket name and the endpoint.For example,
PutObject
formyfile
against bucketbucket1.mydomain.com
with endpointlocalhost:9000
should have headers:Yet it actually has
Regression Issue
Expected Behavior
The endpoint would not be part of the
Host
, since that is a pointer as to where to find the fully named bucket.Current Behavior
Includes it in the host. See the bug description
Reproduction Steps
Possible Solution
No response
Additional Information/Context
I did try various combinations of
BaseEndpoint
andEndpointResolverV2
as described in this doc, to no avail.I suspect there is some combination of which I am not aware, in which case feel free to call this a "docs error report" as opposed to a bug report.
AWS Go SDK V2 Module Versions Used
Compiler and Version used
go version go1.23.0
Operating System and version
linux/amd64