awslabs / aws-sdk-python-signers

AWS SDK Python Signers provides stand-alone SigV4 signing functionality.
Apache License 2.0
11 stars 3 forks source link

Initial feedback #2

Open benkehoe opened 5 months ago

benkehoe commented 5 months ago

I'm excited about this library.

My main feedback is that it's built around dataclasses that make you transfer data back and forth. You copy your headers from your native representation into an AWSRequest, sign that, and then copy them back into your native representation. You even make the user parse the URL into parts for you.

I'd much rather the library work against something like a RequestAdapter interface that surfaces the translations directly in one place. The user would have to provide get_headers() that would return not these Fields you have come up with, but the derived list of tuples you need, and a add_headers() to put the signatures back in. The implementation that the user has to provide is then just working against the data in its native representation. Maybe provide the adapter interface using abc.

For credentials, natively supporting refreshing is important, and out of the box compatibility with boto3 is a must (doesn't have to have a dependency, but if I do SigV4Signer(..., identity=boto3.Session().get_credentials()) it should just work. You should really remove the type check, anything that's got the right fields should work (and use hasattr so it doesn't even have to provide expiration, the frozen credentials from boto3 will work).

hb2638 commented 2 months ago

I agee on the URL parsing but disagree with everything else, though I understand where you're coming from.

For the URL parsing, the aws_sdk_signers.URI is immutable and here are some ideas

  1. Add a static parse method to aws_sdk_signers.URI
    
    import urllib.parse

@dataclass(kw_only=True, frozen=True) class URI(interfaces_http.URI):

@staticmethod
def parse_url(url: str) -> URI:
    url_parts = urllib.parse.urlparse(url)
    return URI(
        scheme=url_parts.scheme,
        host=url_parts.hostname,
        port=url_parts.port,
        path=url_parts.path,
        query=url_parts.query,
        fragment=url_parts.fragment,
    )
2. Allow destination in **aws_sdk_signers.AWSReques**t to be an **aws_sdk_signers.URI** or a **str**. Internally it should convert the string to a **aws_sdk_signers.URI**
```python
class AWSRequest(interfaces_http.Request):

    def __init__(
        self,
        *,
        destination: URI|str,
        method: str,
        body: AsyncIterable[bytes] | Iterable[bytes] | None,
        fields: Fields,
    ):
        if isinstance(destination, str):
            destination = URI.parse_url(destination)
        self.destination = destination
        self.method = method
        self.body = body
        self.fields = fields

As for where I disagree...

Intoducing things like a RequestAdapeter means they would need to do something similar for other packages/modules (E.x.: urllib3, aiohttp, urllib, etc). That makes it convenient for us as comsumers but a burden for them to develop/test/maintain (E.x: what is the minum and maximum version they should support for each of those external packages?)

This is a low level library with no dependencies (even boto3/botocore) and that makes sense. Keep your dependencies to a minimum which makes it less likely it will cause a pip install to fail for us, the consumer, because of version conflicts (E.x: cannot pip install aws-sdk-signers because it requires a minum version of boto3 1.2.3 but we can use that version because another package can only support up to version 1.2.0)

There are ways around that but they're not silver bullets and will introduce other problems while fixing that problem.

E.x.: treating them like vendor packages but that increaes the package size which makes it harder to use this in zip based lambdas. I know you mentioned treating these as optional depenencies but you'd still need sanity checks for when the libraries do exist to ensure you're using verisions of those libraries that you support.

The code examples they provided were more than enough for me to create a requests.auth.AuthBase that uses credentials from a boto3 sesson. But even this is something that's best left as a code example instead of something ofically supported and maintained in the package because it's easier to customize it to your needs, tastes and style which will be different from others.

E.x.: In this code I have a get_or_create_signing_properties method to determine what the aws_sdk_signers.SigV4SigningProperties should be based on the url. My only use case was to download objects from S3 so that's supported but I don't have support for any other service. If a similar helper function was in this package, they might need to do a release every time a new service is added (assuming the mapping/lookup it's hardcoded)

import urllib.parse

import aws_sdk_signers
import boto3
import requests.auth

class SigV4Auth(requests.auth.AuthBase):
    SIGNING_HEADERS = (
        "Authorization",
        "Date",
        "X-Amz-Date",
        "X-Amz-Security-Token",
        "X-Amz-Content-SHA256",
    )
    _signing_props = {}

    def __init__(
        self,
        session: boto3.Session|None=None,
    ):
        if session is None:
            if boto3.DEFAULT_SESSION is None:
                boto3.setup_default_session()
            session = boto3.DEFAULT_SESSION
        self.session = session
        self._signer = aws_sdk_signers.SigV4Signer()

    def __call__(self, r: requests.PreparedRequest) -> requests.PreparedRequest:
        self.sign_request(r)
        return r

    def get_or_create_signing_properties(self, url: str) -> aws_sdk_signers.SigV4SigningProperties:
        url_parts = urllib.parse.urlparse(url)
        error = Exception(f"The url [{url}] is not supported.")
        if url_parts.scheme != "https":
            raise error
        netloc_parts = url_parts.netloc.split(".")
        if url_parts.netloc.endswith("s3.amazonaws.com") and len(netloc_parts) == 4:
            cache_key = (self.session.region_name, "s3",)
            signing_props = self._signing_props.get(cache_key)
            if signing_props is None:
                signing_props = aws_sdk_signers.SigV4SigningProperties(region=cache_key[0], service=cache_key[1], content_checksum_enabled=True)
                self._signing_props[cache_key] = signing_props
            return signing_props
        raise error

    def sign_request(self, r: requests.PreparedRequest):
        request = self.convert_to_awsrequest(r)
        signing_properties = self.get_or_create_signing_properties(r.url)
        creds = self.session.get_credentials()
        identity = aws_sdk_signers.AWSCredentialIdentity(
            access_key_id=creds.access_key,
            secret_access_key=creds.secret_key,
            session_token=creds.token,
        )
        signed_request = self._signer.sign(signing_properties=signing_properties, request=request, identity=identity)
        for header in self.SIGNING_HEADERS:
            if header in signed_request.fields:
                r.headers[header] = signed_request.fields[header].as_string()
        return r

    def convert_to_awsrequest(self, r: requests.PreparedRequest) -> aws_sdk_signers.AWSRequest:
        url_parts = urllib.parse.urlparse(r.url)
        uri = aws_sdk_signers.URI(
            scheme=url_parts.scheme,
            host=url_parts.hostname,
            port=url_parts.port,
            path=url_parts.path,
            query=url_parts.query,
            fragment=url_parts.fragment,
        )
        fields = aws_sdk_signers.Fields([aws_sdk_signers.Field(name=k, values=[v]) for k, v in r.headers.items()])
        return aws_sdk_signers.AWSRequest(
            destination=uri,
            method=r.method,
            body=r.body,
            fields=fields,
        )