Open dannysauer opened 3 years ago
Hi @dannysauer,
Thanks for the feature request! I can definitely see how this could be useful. I'll make sure the team sees this, although I can't make any guarantees as to when/if this would be implemented.
Hi @dannysauer,
I was able to review with the team and just wanted to clarify what you're passing into Boto3 (in terms of Cloudformation pre-signed urls) and what the output is. Would you be able to provide an example?
Regarding the feature request— We don't want to allow customers to provide a free-form set of characters to ignore when URIencoding, as it will most likely break in the common case.
Referencing the example you provided above, it looks like we're percent encoding
response-content-disposition=attachment;filename="testrpm-9-0.1-1.x86_64.rpm"
including ;
and =
, which is correct. Those have reserved meanings and shouldn't exist unescaped outside of that.
With something like ?key=thing=otherthing;otherkey=this
, we can't reasonably discern where a key starts and ends. In this case, it would be key=thing%3Dotherthing%3Botherkey%3Dthis
, which allows us to disambiguate from the syntax for multiple separate entries: ?key=thing&otherkey=this
.
Here is a guide for how SigV4's signature is calculated (see step 3). We must use the query string and components must be in that order for any request using SigV4 with S3 being an exception for historical reasons. If the query string is modified and no longer matches what we use to sign, the signature is essentially broken. Any modifications must be double-encoded. We're not sure if you can accomplish this after the URI is encoded, but you can certainly try.
Thanks for following up. I'll do what I can to provide more information. :)
Regarding the feature request— We don't want to allow customers to provide a free-form set of characters to ignore when URIencoding, as it will most likely break in the common case.
I would assume that the "don't escape these chars" kwarg would default to either the set that is normally escaped or None, so in the common case no one would provide it and things would continue working unchanged. :)
Would you be able to provide an example?
The URL I'm ultimately passing in looks generally like this:
https://somesite.com/pulp3-media/artifact/3d/337f03f5dfbf01124e05d87638aca193926def8c102f91e51e75262a654427?response-content-disposition=attachment;x-pulp-artifact-path=/some/path/to/testrpm-9-0.1-1.x86_64.rpm;filename=testrpm-9-0.1-1.x86_64.rpm
I'm adding an additional key=value token to the parameter (and ultimately, S3-generated header) for handler-specific purposes. This is so I can get a single set of logs which contain the full path clicked on which redirected to this specific file, which could theoretically be reached from multiple paths in my application. It's pretty annoying to try to correlate based solely on the timestamp. That's ok with the content-disposition RFC and seems to be handled just fine by S3, so I think it's not completely unreasonable. Feel free to disagree. :D I'm also leaving out the RFC-required double quotes now, because those break even more tools, while everything I tested with is cool with the quotes being absent.
What ultimately happens is that the generate_signed_url call escapes several of the characters (the /
, ;
, and =
) before signing. Then when I use apt on Ubuntu/Debian to download the file (ignore that the example is an RPM), because of an old bug in apt, those entities are converted back into regular ASCII characters.
including ; and =, which is correct. Those have reserved meanings and shouldn't exist unescaped outside of that
None of those chars "must" be escaped in a query string, as far as I can tell. The =
has reserved meaning in the path component, and then only in the first section (I think). Bear with my quick RFC reading here; I could've missed something. According to https://www.ietf.org/rfc/rfc3986.txt (which is where RFC7230 refers query string chars to), section 2.2, the reserved characters only need escaped if their purpose would conflict with their use as a delimiter, though it also says they "should" be escaped if they're not allowed in that section of the URL. In the case of the equal and semicolon in the query string, the only characters with special meaning are ?
and #
(section 3.4) which delimit the beginning and end; the contents of the query string are opaque and to be handled by the URI-consuming application. It's reasonable to assume that the code generating data to be passed in for signing might be aware of the end consumer's use, and handle escaping accordingly; taking that control away from the developer assuming that they definitely don't know what they're doing feels kinda wrong. :)
Commenting to prevent staleness. :)
I’m relabeling this back to a feature request. I think the ask here is essentially the same as this: https://stackoverflow.com/questions/55978274/allow-unknown-key-value-pairs-in-s3-pre-signed-url
In which case I’d echo the response there that says:
Pre-signed URLs are designed to be tamper-resistant. Any modification that changes the elements of the URL invalidates the signature. That's by design, and unrelated to any limitation in the Java SDK.
The ask here is basically that there be a mechanism by which the URL is signed exactly as passed in without tampering.
Default to assuming users want their mistakes silently fixed, fine - but allow a developer who knows exactly what they're doing to override the digital nanny. :)
(roughly every-other-year ping :wink:)
Bumping this a year later, is there really no way of passing in a query param using CloudFrontSigner.generate_presigned_url
?
Is your feature request related to a problem? Please describe. Signed URL validation is inconsistent.
When a CloudFront signed URL is generated with a querystring, any modification to the querystring causes the signature validation to fail and results in an access denied. When an S3 URL is signed, the querystring can be modified without impact. I suspect that S3 is pruning the querystring before validating, while CloudFront is checking the entire URL with querystring.
While I'm inclined to think this is actually a bug in the way AWS implements signed URL validation and that the "right" fix is for CloudFront to replicate the S3 behavior when the origin is an S3 bucket, that might be out of scope for boto. The problem I'm running into is that lots of tools will manipulate query strings before requesting them. One of those tools is
apt
(specifically apt-https-transport) on Debian-derived distros (which ironically added the reparsing to fix S3 URLs with the request-content-disposition header). I've contributed fixes or started opening bugs with those tools to quit that, but what I'd really prefer is for boto to alter the way query strings are escaped. This is most commonly an issue with theresponse-content-disposition=attachment;filename="testrpm-9-0.1-1.x86_64.rpm"
. The filename parameter is supposed to be double-quoted, per the relevant RFC. So there's a double-quote, an =, and some semicolons that get escaped by boto but don't really need escaped.When one builds an S3 signed URL, the generate signed url method takes "params" as a dict to assemble before signing. When building a CloudFront URL, the method to sign expects the whole URL to be passed in with the query appended. A pretty common pattern to build the query string is to use Python's url.parse.urlencode() method, which just takes a dict and spits out a querystring which can then be appended to the base URL. It takes an optional
safe
kwarg which describes chars to protect from html entity escaping. Because boto's parameter escaping is a bit over-zealous by default (quotes, semicolons, and equals are particularly relevant here), there are situations like mine where simply adding a parameter to not escape=
and;
(and/
) would work around the issue of apt's query mangling. I think it's unreasonable to expect that every http library out there will stop changing URLs they're given, because http generally allows that. Allowing the end-user to nudge the behavior a bit would fix (well, work around) several of those situations.Describe the solution you'd like Add
safe
as a kwarg to the cloudfrontgenerate_presigned_url
which defines a list of characters which should not be converted to html entities.