aws-powertools / powertools-lambda-python

A developer toolkit to implement Serverless best practices and increase developer velocity.
https://docs.powertools.aws.dev/lambda/python/latest/
MIT No Attribution
2.91k stars 401 forks source link

Tech debt: improve support for parameter injection #5325

Open tobocop2 opened 1 month ago

tobocop2 commented 1 month ago

Why is this needed?

My apologies if there something already exists that addresses this issue. I searched all of the code and I just could not figure out a reusable way to handle parameter injection without rolling my own system.

Request body and query string parsing as well as other aspects of request parsing are not reusable. For example, a pydantic based parameter injection system can be put in place at some point in the stack to make it more reusable. I've implemented this in a work project using a custom decorator

from functools import wraps
from typing import Optional, Type
import inspect
import json

from aws_lambda_powertools.utilities.data_classes import APIGatewayProxyEvent
from pydantic import BaseModel, ValidationError

from lib.common.app import app

# NOTE: this utility does not map these types to the open api specification. 
def inject_params(
    query_model: Optional[Type[BaseModel]] = None,
    body_model: Optional[Type[BaseModel]] = None,
):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            _ = args
            api_event: APIGatewayProxyEvent = app.current_event
            query_params = None
            body_params = None

            if query_model:
                query_dict = api_event.query_string_parameters or {}
                try:
                    query_params = query_model(**query_dict)
                except ValidationError as e:
                    # Raise validation error to be handled by existing middleware
                    raise e

            if body_model:
                try:
                    body_dict = json.loads(api_event.body or "{}")
                    body_params = body_model(**body_dict)
                except (ValidationError, json.JSONDecodeError) as e:
                    # Raise validation error to be handled by existing middleware
                    raise e

            # Dynamically inject only the parameters that the handler expects
            handler_sig = inspect.signature(func)
            if "query_params" in handler_sig.parameters and query_params is not None:
                kwargs["query_params"] = query_params
            if "body_params" in handler_sig.parameters and body_params is not None:
                kwargs["body_params"] = body_params

            return func(*args, **kwargs)

        return wrapper

    return decorator

Example usage:


from http import HTTPStatus

from get_things import get_things
from lib.common.api_client.models.responses import PaginatedThingResponse
from lib.common.app import app, logger, metrics
from lib.common.aws import JsonResponse # custom class I implemented, ignore for the scope of this issue
from lib.common.dtos import GetThingsRequest
from lib.common.middleware.inject_params import inject_params
from lib.common.utils import run_async

@app.get("/thing")
@inject_params(query_model=GetThingsRequest)
def handler(
    query_params: GetThingsRequest,
) -> JsonResponse[PaginatedThingResponse]:
    response: PaginatedThingResponse = run_async(get_things, query_params)
    return JsonResponse(response, status_code=HTTPStatus.OK)

So this allows for reusable request parsing with pydantic. The documented parsing strategy at this time is something that otherwise has to be repeated in every handler. Since this framework really leans into pydantic already I think this kind of functionality should exist. My implementation doesn't cover header parsing or anything else, but it could be extended to support it.

I'd be happy to make an open source contribution if it makes sense. If so, where would be the best place?

Which area does this relate to?

No response

Suggestion

No response

Acknowledgment

boring-cyborg[bot] commented 1 month ago

Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

leandrodamascena commented 1 month ago

Hello @tobocop2! Thanks for opening this issue and yes, we do not support injecting parameters this way right now. But to be honest, I understand the code you sent, but I'm afraid I don't understand what you're trying to achieve or the idea behind this.

What I understand so far: you have a model called GetThingsRequest that you want to inject as query_params but you want it to be automatically constructed and validated based on the querystring values ​​that came from the payload (app.context. query_string_parameters)?

So for example:

def MyClass(BaseModel):
  name: str
  age: int

https://....../thing?name=Leandro&age=10

In this case, do you dynamically build the query_model (query_params = query_model(**query_dict)) and return it as a route argument?

I would appreciate it if you could simplify the code with some explicit Model (like my example) and an example of what the HTTP URL looks like, I would like to understand this idea more deeply.

tobocop2 commented 1 month ago

actually you're example is perfect! What You just described is fundementally exactly what Is being done.

without a parameter injection system,

from aws_lambda_powertools.utilities.data_classes import APIGatewayProxyEvent
from pydantic import ValidationError, BaseModel

class MyClass(BaseModel):
  name: str
  age: int

@app.get("/thing")
def get_handler(query_params: MyClass) -> Response:
      api_event: APIGatewayProxyEvent = app.current_event
      query_dict = api_event.query_string_parameters or {}
      try:
          query_params = MyClass(**query_dict)
      except ValidationError as e:
          # Raise validation error to be handled by existing middleware
          raise e
     print(query_params)

@app.post("/thing")
def post_handler() -> Response:
    api_event: APIGatewayProxyEvent = app.current_event
    try:
        body_dict = json.loads(api_event.body or "{}")
        body_params = MyClass(**body_dict)
    except (ValidationError, json.JSONDecodeError) as e:
        # Raise validation error to be handled by existing middleware
        raise e
    print(body_params)

with a parameter injection system, this code:

from pydantic import BaseModel
from path.to.inject_params import inject_params # this utility is described in the original post

class MyClass(BaseModel):
  name: str
  age: int

@app.get("/thing")
@inject_params(query_model=MyClass)
def get_handler(query_params: MyClass) -> Response:
    # do stuff with query params
    print(query_params) # should be the query string parsed to a pydantic object

@app.post("/thing")
@inject_params(body_model=MyClass)
def post_handler(body_params: MyClass) -> Response:
    # do stuff with query params
    print(body_params) # should be the request body parsed to a pydantic object

As you can see, parsing the event query string or body using pydantic is something that can be turned into something reusable so it doesn't have to be repeated in each handler function body.

Also, with parameter injection, open api spec support needs to be properly abstracted as well. The utility I threw together in the OP doesn't handle that because I'm not currently using the open api spec functionality in this library.

@leandrodamascena i hope these examples clarify the benefit of a shared responsibility parameter injection system so that repeated event parsing and de-serialization can be avoided.

leandrodamascena commented 1 month ago

i hope these examples clarify the benefit of a shared responsibility parameter injection system so that repeated event parsing and de-serialization can be avoided.

Thanks a lot for investing time in this discussion @tobocop2! We can only improve the customer experience if the customer tells us what needs to be improved. We really appreciate this kind of discussion on this project.

I now understand the idea you explained and I have some considerations.

Data validation in POST requests

[!important] Data validation documentation

We already support validating body payloads using Pydantic Models, so you don't need to add any additional code, just by annotating the parameter we validate it. We not validate, but we also serialize the response and raise a 422 if data validation fails.

I'm wondering how this code

@app.post("/thing")
@inject_params(body_model=MyClass)
def post_handler(body_params: MyClass) -> Response:
    # do stuff with query params
    print(body_params) # should be the request body parsed to a pydantic object

it's different from that

from pydantic import BaseModel

from aws_lambda_powertools.event_handler import APIGatewayRestResolver
from aws_lambda_powertools.utilities.typing import LambdaContext

app = APIGatewayRestResolver(enable_validation=True)

class MyClass(BaseModel):
  name: str
  age: int

@app.get("/todos")
def create_todo(todo: MyClass) -> dict:  
    return {"my_name": todo.name}

def lambda_handler(event: dict, context: LambdaContext) -> dict:
    return app.resolve(event, context)

Honestly I like the experience of just annotating the parameter and we validating it. If you want to use a different key for body, you can also annotate

def create_todo(todo: Annotated[Todo, Body(embed=True)]) -> int:

You can also catch RequestValidationError and wrap it in a function using the @app.exception_handler function:

@app.exception_handler(RequestValidationError)  
def handle_validation_error(ex: RequestValidationError):
    logger.error("Request failed validation", path=app.current_event.path, errors=ex.errors())

    return Response(
        status_code=422,
        content_type=content_types.APPLICATION_JSON,
        body="Invalid data",
    )

Data validation in GET requests

In the opposite direction of POST requests, we do not support Pydantic models in QueryString args (GET) and we need to support that in the future. I'm looking for a similar experience to what they've built in FastAPI to validate QueryString values ​​with Pydantic models.

New decorator to inject parameters/validation

While I appreciate the code you've created and see its value, introducing a new decorator might confuse customers who may not understand how it differs from the existing data validation feature. I'd prefer to focus on enhancing our current data validation capabilities. In the coming months, we should prioritize:

Also, with parameter injection, open api spec support needs to be properly abstracted as well. The utility I threw together in the OP doesn't handle that because I'm not currently using the open api spec functionality in this library.

You can use Data validation without creating an OpenAPI schema. While these features have some overlap, you can use Data validation on its own. Is anything stopping you from using Data validation? I'd like to hear this case here.

Thanks for investing time on this and please let me know your thoughts.

tobocop2 commented 1 month ago

Well i needed something for both the request body and query strings. It seems that as of now there is only support for the request body but no request parameters in the query string or headers. I dislike that the current validation system automatically assumes that the parameters in the function signature are the request body. This is ultimately too implicit and that's why I made my decorator have explicit keywords for each request parameter type that I needed to support.

As you pointed out, the example you provided is only valid for requests that accept a body. In the case of a GET request the body is always ignored so you're limited to query strings and headers. In these cases being able to de-serialize the parameters into a pydantic object would be very helpful. I don't know the best place for it, but the mechanism I provided in the OP is just how I am handling it right now but I appreciate that this is on the roadmap.

leandrodamascena commented 1 month ago

Well i needed something for both the request body and query strings. It seems that as of now there is only support for the request body but no request parameters in the query string or headers. I dislike that the current validation system automatically assumes that the parameters in the function signature are the request body. This is ultimately too implicit and that's why I made my decorator have explicit keywords for each request parameter type that I needed to support.

Yes, I agree that we need to enhance support for Pydantic models with queryString arguments. Would you be interested in sending a pull request to improve this implementation? I'd be happy to collaborate with you on this to resolve the issue.

but I appreciate that this is on the roadmap.

We're continuously working to enhance all our utilities. Recently, we've released new features, a new major version, and other improvements. While some enhancements take time to implement, your feedback helps us prioritize effectively. I'll soon update our roadmap to include this item for the coming months.

Thank you for this valuable discussion. I'll keep this issue open for any additional feedback and to ensure it's properly incorporated into our roadmap soon.

tobocop2 commented 1 month ago

Hey, I'd be happy to submit a PR. I don't know when I can get to it but I will look into it and let you know. Where in the call stack would it make the most sense to have this type of functionality? I'm guessing it would be tied to the enable_validation check somewhere in code? That logic would just need to be updated to factor in the different type of request parameters (query string, body, headers, etc)

leandrodamascena commented 1 month ago

Hey, I'd be happy to submit a PR. I don't know when I can get to it but I will look into it and let you know.

No rush, take your time! I know that everyone is so busy. If I have some time next week, I'll invest some time into this I'll let you know of any progress here or you can let me know. I'm excited to work on this collaboration, it will be a great improvement.

Where in the call stack would it make the most sense to have this type of functionality? I'm guessing it would be tied to the enable_validation check somewhere in code? That logic would just need to be updated to factor in the different type of request parameters (query string, body, headers, etc)

Yeah, we need to do that in the data_validation feature, the same as we do with Body field. I think here is a good start point. The code may be complex to understand at some point, but debugging will make it more clear.