Closed lbakerchef closed 3 years ago
These are some of my notes from a brief investigation of the "Host header problem" section of this issue. I've obviously looked into this much less than you have, so take this with a grain of salt.
There are 3 places where we need to consider what Host header is being used:
In my reading of the code, it is possible we have some inconsistencies in all 3 steps, however, since you have observed some oddness on the client-side, lets start there:
Forensic analysis of the diagnostic output revealed that when hitting a presigned URL endpoint, knife cookbook upload was (sometimes?) including a host header that was different than the host header which was used during presigned URL creation (sigv2 apparently did not use a host header to generate presigned URLs; sigv4 does). Specifically, it was (sometimes?) adding a port number to the end of the host string on the host header (e.g. "Host: whatever.com:port"). The reason this is occurring is currently unknown.
which implies that the client is adding the port unexpectedly. There are a few questions we may be concerned with here:
a. What does chef-client currently do? b. What did chef-client do at any point in the past? c. What we can expect from 3rd party clients.
Let's start with (a)
Based on my reading of the code, chef-client will currently add the port to the Host header along the code paths we care about. The header gets added here:
https://github.com/chef/chef/blob/master/lib/chef/http/authenticator.rb#L110
The authenticator middleware is used in the Chef::ServerAPI http client:
https://github.com/chef/chef/blob/master/lib/chef/server_api.rb#L47
The ServerAPI client is used in both the cookbook upload and cookbook download code:
https://github.com/chef/chef/blob/master/lib/chef/cookbook_uploader.rb#L43 https://github.com/chef/chef/blob/master/lib/chef/cookbook/synchronizer.rb#L327
This likely explains why you are seeing knife/chef-client add the port the header.
I haven't yet gone through the version history to see how consistent that behavior is yet to answer (b).
A further wrinkle here is what happens when we think about the behavior of other HTTP clients (including other HTTP client objects you can instantiate inside the chef code base).
HTTP/1.1 requires a Host header (note: HTTP/1.0 does not, so this new signing method might break HTTP/1.0 clients or non-conformant 1.1 clients).
However, the client is not required to always include the port in the header. Specifically, when the port matches the common port of the schema in use.
RFC 2616 says:
Host = "Host" ":" host [ ":" port ] ; Section 3.2.2
A "host" without any trailing port information implies the default port for the service requested (e.g., "80" for an HTTP URL).
Thus, eliding the port is fine for port 80 when using HTTP and port 443 when using HTTPs. Further, based on https://url.spec.whatwg.org/#url-parsing many URL parsers, also elide the port when it matches the default, even when it is explicitly included in the URL.
As a result, many HTTP clients only append the port to the host header when a non-standard port is used. For example:
I think this puts us in a rather tough position. Namely,
If we generate signatures using a Host header without the port, then the current users of Chef::ServerAPI will likely break.
If we generate signatures using a Host header with the port, then other HTTP clients will likely break.
You noted that:
UPDATE: I have tried it the 'other way' i.e. forced the creation of presigned URLs with host headers that have port numbers, but then we get pedant test failures for the instances when a chef client hits a presigned URLs and doesn't include a port number on the host header
I think it would be good to look carefully at those pedant failures and confirm (1) we definitely are always pre-signing with the port and (2) where the failures are happening inside chef/pedant.
Given my current analysis, I don't think we can avoid breaking at least some API clients here, so we'll have to think carefully about which breakage has the smallest scope.
I also looked into the server-side of this equation but haven't included those notes here since I wasn't convinced I was looking at the right branches.
Specifically, it was (sometimes?) adding a port number to the end of the host string on the host header (e.g. "Host: whatever.com:port").
which implies that the client is adding the port unexpectedly.
Thanks. I changed 'adding a port number' to 'including a port number' for clarification purposes. I don't want to imply that I know the client is taking a www.host.com and appending a port to make it www.host.com:port although that could be the case.
You noted that:
UPDATE: I have tried it the 'other way' i.e. forced the creation of presigned URLs with host headers that have port numbers, but then we get pedant test failures for the instances when a chef client hits a presigned URLs and doesn't include a port number on the host header
I think it would be good to look carefully at those pedant failures and confirm (1) we definitely are always pre-signing with the port and (2) where the failures are happening inside chef/pedant.
Pursuant to your comment here, I updated the Host Header Problem section with an additional study on this, and associated data and output.
Closing this since we have evaluated and are currently using the Erlcloud library.
See issue: https://github.com/chef/chef-server/issues/1783
Evaluation of erlcloud: https://github.com/erlcloud/erlcloud
As part of the upcoming AWS sigv4 update work to Chef Server, we should evaluate the suitability of Erlang libraries in order to lower costs of development and future maintenance. Evaluation criteria should include, but not necessarily be limited to: AWS v4 signing support, URL presigning support (if needed - see below), license compatibility, support for STS assumerole and instance profiles, ability to mock/modify time of the request, support for Erlang 22, availability of documentation, whether the library is actively maintained, ease of use, and performance and resource usage.
Question: Do we need presigning of URLs? It appears that we do. Here is a function in
chef_s3.erl
that suggests presigning is necessary:Answer: After conferring with Mark Anderson, the definitive answer is YES, we do need this capability, and it is a deal-killer if we cannot get it somehow. For one thing, the chef client API expects and uses a presigned URL.
Bookshelf
also apparently uses presigned URLs (possibly in interaction with the chef client). One option is to check any forks oferlcloud
to find out if this capability has been added. A second option could be to work witherlcloud
(motobob or kkuzmin) to get a PR merged (possibly #586) which addresses the issue. Yet another option could be to implement the sigv2/sigv4 paths inerchef
, and leavebookshelf
alone (but how does that get us away from sigv2?). See more about presigning below.License: https://github.com/erlcloud/erlcloud/blob/master/COPYRIGHT Excerpt from legal dept: "The license is compliant with Chef licenses, but the terms of the grant require that the text of the file stay with the library. If you can do that, feel free to use."
STS assumerole and instance profiles: Supported.
Erlang versions: 21 is officially supported. Appears to compile and run under 22 in limited testing (see below). Further investigation is required.
Documentation: Little.
AWS v4 signing: Appears to do v4 signing transparently "under the hood" without need for overt construction of signing keys and other documents by the developer. sigv2/sigv4 usage was identified using the following methodology (see "Output of erlcloud Erlang session" below): 1) Enable S3 Server Access Logging on a bucket. Server Access Logging was enabled on bucket.
2) Choose an access log to examine.
LOG-2020-02-06-21-19-31-A6698A7E9DDAE4ED was chosen for examination.
3) Look for the SignatureVersion element in the server access log, as per https://aws.amazon.com/blogs/aws/amazon-s3-update-sigv2-deprecation-period-extended-modified/ and https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html .
Forensic analysis concluded that SignatureVersion element SigV4 was identified in the access log. SignatureVersion element SigV2 was not identified in the access log.
4) Identify bucket region, as per https://aws.amazon.com/blogs/aws/amazon-s3-update-sigv2-deprecation-period-extended-modified/ .
The aforementioned document indicates that the us-east-2 region makes exclusive use of SigV4 buckets. The bucket used for testing was confirmed to be located in the us-east-2 region.
URL presigning: No.
mini_s3
does sigv2 presigning, also referred to as "query string request" for enabling direct 3rd-party access to to s3 data without proxying the request (the other way to do this is via http request with the signature embedded in the headers).erlcloud
apparently does not do this, although there are some issues and PRs which have been filed. Some options are: use another library, write our own sigv4 presigning,don't do sigv4 presigning and proxy requests instead, or continue to use sigv2 presigning, possibly as a stop-gap measure or delaying tactic until erlcloud adds the feature.Issues and PRs regarding URL presigning: erlcloud/erlcloud#562 erlcloud/erlcloud#586 erlcloud/erlcloud#560 erlcloud/erlcloud#342
Mock/modify time of the request: Possibly through: https://docs.aws.amazon.com/general/latest/gr/sigv4-date-handling.html
(pre?)Signed request verification: Per Mark Anderson, "We need the capability to take a (pre?)signed request and verify that it is good, i.e. answer the question 'is it appropriately signed?' This capability is needed by
bookshelf
which currently does this with v2 requests." Possibly seeerlcloud's
make_get_url
.If I understand this correctly, we will need to somehow (re?)construct a presigned url to compare with the request url coming in. In theory, we'd just take our own presigned url construction function, and jam in arguments for method, bucketname, key, lifetime, headers, and config (access key and secret access key in particular). In practice, I'm not sure if we would have all of the necessary inputs to accomplish this. Further investigation is required.
Output of
erlcloud
Erlang session:Study: Integration with Chef Server:
A study was conducted to integrate
erlcloud
with theoc_erchef
component of Chef Server. The basic approach selected was to turn offbookshelf
, and replace the innards ofmini_s3
with calls toerlcloud
functions. This 'mini_s3 as an erlcloud wrapper' approach was selected for suitability to 'proof of concept' purposes and expedience, as all routing tomini_s3
could easily be captured and rerouted toercloud
(note that any final approach may differ).The
erlcloud
dependency was wired-in via edits torebar.config
files, and a branch was tested via the buildkite verify and omnibus-adhoc pipelines. Failures were observed inbookshelf
before disabling it. This was expected, asbookshelf
will need to be patched for sigv4. Additionally,oc_erchef
was observed attempting to connect toS3
but failing due to credentials issues. The credentials issue was tackled in a separate study outlined further down. A sample credentials failure is recorded below:Impedance Mismatches
Issues were encountered during the study. One example, the aws config record used by the
mini_s3
anderlcloud
libraries differs in name, numbers, and types of elements. Furthermore,erlcloud
expects the s3 url of itsaws_config
record to be broken intos3_host
ands3_port
elements or fields, whilemini_s3
places both the host and port data into a single url packed into thes3_host
field. This particular issue was worked around by deploying a hack, but will need to be revisited in any futureerlcloud
work. Further anomalies and inconsistencies between the libraries are listed below, as well as additional concerns:mini_s3
functions not contained inerlcloud
: new/5, s3_url/6.erlcloud
testing does not use a session token. These two methods appear to be incompatible. Further investigation ins required.mini_s3:format_s3_uri/2
will need changes (assuming this function moves to live somewhere else aftermini_s3
goes away)Study: Credentials Problems and Resolution
There are several ways of getting aws credentials into
erlcloud
. The most obvious and expedient way was to generate credentials usingokta_aws
and insert those credentials into environment variables to be picked up byerlcloud
. However, this method also generates a token in addition to an access key and a secret access key, and was found to be incompatible with the 'access key' method of providing credentials toerlcloud
which uses longer-lived (non-temporary) credentials and does not use a token:After further research, a different way of generating credentials was discovered which involved creating a new IAMS user from within the Amazons AWS console and giving it the correct permissions. This resulted in an an access key and secret access key and no token. Placing those values into the environment and firing up an Erlang shell yielded the following:
Study: sigv4 Presigned URLs
An evaluation was conducted of a PR https://github.com/erlcloud/erlcloud/pull/586 which added sigv4 presigned URL capability to
erlcloud
. Presigned URLs were generated "by hand" in anErlang
shell for an HTTP PUT and a GET against the same pre-existing S3 bucket located in an exclusivelysigv4
region (us-east-2). After obtaining the presigned URLs,curl
was used to execute the PUT and GET:Study: Presigned URLs with Headers
Testing of
chef-server
with the newly-added sigv4 presigned URL capability revealed the need for the ability to incorporate HTTP headers when presigned URLs are being generated. Neithererlcloud
nor the presigned URL PR https://github.com/erlcloud/erlcloud/pull/586 possessed this capability, therefore a proof-of-concept had to be coded up in-house. Afterwards, presigned URLs with two arbitrary headers incorporated were generated "by hand" in anErlang
shell for an HTTP PUT against a pre-existing S3 bucket located in an exclusivelysigv4
region (us-east-2). After obtaining the presigned URLs,curl
was used to execute the PUT. The PUT was successful:Study: Content-MD5 Header
Problems were encountered during generation of the Content-MD5 header (S3 would not accept it), and the correct way of generating the header was undocumented. Alas, discovery requires experimentation:
Host Header Problem
During pedant tests (specifically
--focus-knife
) several errors were received from S3 of the form:"SignatureDoesNotMatch - The request signature we calculated does not match the signature you provided. Check your key and signing method."
The code was extensively instrumented on both the client and server sides. Forensic analysis of the diagnostic output revealed thatknife cookbook upload
was sometimes including a host header when hitting the presigned URL endpoint that was different than the host header which was used during URL presigning. Specifically, it wasaddingincluding a port number on the end of the host string on the host header (e.g. "Host: whatever.com:port"). The reason this is occurring is currently unknown.In a nutshell, the presigned URL never appears to be created with a host header that includes a port number (at least with the observed failures);
knife
sometimes hits the presigned URL endpoint with a host header that includes a port number. I have also tried it the 'other way' i.e. forced the creation of presigned URLs with host headers that have port numbers, but then we get pedant test failures for the instances when a chef client hits the presigned URLs and doesn't include a port number on the host header.Several solutions to the problem were proposed and rejected because any fix must be backwards-compatible with the current version of hosted chef; however, to illustrate the problem, a build was created exhibiting the default behavior of presigned URLs being constructed with host headers without port numbers, and a single line of ruby code was patched into line 148 of
/opt/opscode/embedded/lib/ruby/gems/2.6.0/gems/chef-15.8.23/lib/chef/http.rb
on a running dev VM. This line of code simply removes any colon and port number from the end of the host header string, and thus appears to remove the error. Other files which were examined includeupload_spec.rb
,delete_spec.rb
,cookbook_uploader.rb
,cookbook_util.rb
,knife_util.rb
,sandbox.rb
andbasic_client.rb
.A similar study was conducted which forced the creation of presigned URLs with port numbers in the host headers. A baseline run of
chef-server-ctl test --all
resulted in 138 pedant test failures. Then an attempt was made to patchhttp.rb
as above, except adding a port number to the host header instead of removing. The patch had no effect, suggesting that a different client was being used by pedant for these failures. After some investigation a patch was made to theauthenticated_request
method of/opt/opscode/embedded/service/oc-chef-pedant/lib/pedant/request.rb
which succeeded in bring the number of pedant test errors down from 138 to 16. It was noted that with the 16 remaining errors, at least in some cases the port is still not being added to the host header, suggesting that another file might need to be patched.This issue has yet to be resolved, and more investigation is required to fully understand the issue and resolution.
Excerpt from
chef-server-ctl test --focus-knife
: (presigned URLs created without port on host header)Excerpt from patched
http.rb
illustrating pseudo 'correction' of the issue - see line labeled 'PATCH':Excerpt from
chef-server-ctl test --focus-knife
after applying pseudo 'correction': (presigned URLs created without port on host header)Excerpt from
chef-server-ctl test --focus-knife
: (presigned URLs created with port on host header)Excerpt from patched
/opt/opscode/embedded/service/oc-chef-pedant/lib/pedant/request.rb
illustrating pseudo 'correction' of the issue - see line labeled 'PATCH':Excerpt from
chef-server-ctl test --focus-knife
after applying pseudo 'correction': (presigned URLs created with port on host header)Bookshelf API URL Endpoint Problem
After enabling
bookshelf
for testing, a number of errors were received of the following form:The pedant tests were trying to connect to
bookshelf.api.chef-server.dev:443
but investigation using various tools (ping, curl, etc) revealed that there was no such active endpoint. It was hypothesized that the actual endpoint which chef servers were using wasapi.chef-server.dev:443
, but changes introduced byerlcloud
andmini_s3
added the bucketname (bookshelf
) to the front of presigned urls, which is correct fors3
but (apparently) not forbookshelf
. It was confirmed thatapi.chef-server.dev:443
was operational.The first strategy to attempt to solve the problem was to change (through configuration) the
bookshelf
api endpoint fromapi.chef-server.dev:443
tobookshelf.api.chef-server.dev:443
. Pursuant to this, various permutations of the following config settings were tried inchef-server.rb
but they were unsuccessful:Eventually, an
erlcloud
configuration was discovered which would change the presigned url style from 'bucketname before host' (vhost e.g. protocol://bucketname.host:port) to 'bucketname after host' (path e.g. protocol://host/bucketname:port), and a temporary hack was patched intomini_s3
's new/3 function which would crudely detect whetherbookshelf
ors3
was being used, and configureerlcloud
for the correct url style accordingly. This hack worked.In summary, it has been confirmed that
s3
accepts both 'bucketname before host' (vhost) and 'bucketname after host' (path) style urls, butbookshelf
(apparently) doesn't and accepts only 'bucketname after host' (path) style urls.The production fix will be to changemini_s3
's new function to return configs for only 'bucketname after host' style urls.ADDENDUM: Amazon states "Buckets created after September 30, 2020, will support only virtual hosted-style requests. Path-style requests will continue to be supported for buckets created on or before this date." The production fix will thus be to change
bookshelf
to acceptbucketname before host
(vhost) style urls, and listen onbookshelf.api.chef-server.dev
vs.api.chef-server.dev
. The previously-suggested production fix was easier, and known. This newly-advised fix will be more difficult, and will require further investigation on how to achieve it.erlcloud
configs:mini_s3
functions requiring patching: new/3 new/4bookshelf
and server-side code requiring patching: unknownSee: https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/ https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html http://www.wryway.com/blog/aws-s3-url-styles/ https://acloud.guru/forums/aws-certified-solutions-architect-associate/discussion/-KMozONkdwMHmGJbKfc0/url-of-an-s3-bucket-quiz-question
Preliminary Assessment
erlcloud
will not be a super quick/easy 'drop-in' replacement formini_s3
, but neither would any other library, most likely. Further investigation is required.Conclusion
Forthcoming.
PRs: https://github.com/chef/chef-server/pull/1993
Branches: - chef-server lbaker/aws-sigv4 lbaker/aws-sigv4-debug lbaker/aws-sigv4-praj lbaker/aws-sigv4-scratch lbaker/aws-sigv4-scratch-rebased-erlang22 lbaker/presigned lbaker/presigned-headers - mini_s3 lbaker/erlcloud lbaker/erlcloud-env lbaker/ex-aws lbaker/presigned lbaker/presigned-headers-host lbaker/presigned-headers - lbakerchef/erlcloud lbaker/aws-sigv4 lbaker/presigned