aws-powertools / powertools-lambda-python

A developer toolkit to implement Serverless best practices and increase developer velocity.
https://docs.powertools.aws.dev/lambda/python/latest/
MIT No Attribution
2.83k stars 389 forks source link

Bug: The pydantic parser breaks when parsing JSON with a model containing a field with the type RawDictOrModel #5303

Open UlfurOrn opened 2 days ago

UlfurOrn commented 2 days ago

Expected Behaviour

Parsing an event/field from JSON containing a field with the type RawDictOrModel (like EventBridgeModel) should work.

Current Behaviour

Parsing an event/field from JSON containing a field with the type RawDictOrModel (like EventBridgeModel) results in an error:

self = <pydantic.type_adapter.TypeAdapter object at 0x106895dd0>
data = '{"version":"version","id":"id","source":"source","account":"account","time":"2024-10-04T11:55:04.861457","region":"region","resources":[],"detail_type":null,"detail":{"key":"value"},"replay_name":null}'

    @_frame_depth(1)
    def validate_json(
        self, data: str | bytes, /, *, strict: bool | None = None, context: dict[str, Any] | None = None
    ) -> T:
        """Usage docs: https://docs.pydantic.dev/2.9/concepts/json/#json-parsing

        Validate a JSON string or bytes against the model.

        Args:
            data: The JSON data to validate against the model.
            strict: Whether to strictly check types.
            context: Additional context to use during validation.

        Returns:
            The validated object.
        """
>       return self.validator.validate_json(data, strict=strict, context=context)
E       NotImplementedError: Cannot check issubclass when validating from json, use a JsonOrPython validator instead.

Code snippet

# Minimal reproducible example:
EventBridgeModel(
    version="version",
    id="id",
    source="source",
    account="account",
    time=datetime.now(),
    region="region",
    resources=[],
    detail={"key": "value"},
).model_dump_json()

parse(event, model=EventBridgeModel)

# Actual use-case where this was discovered - parsing EventBridgeModel's using the SqsEnvelope:

event = {
    "Records": [
        {
            "messageId": "19dd0b57-b21e-4ac1-bd88-01bbb068cb78",
            "receiptHandle": "MessageReceiptHandle",
            "body": EventBridgeModel(
                version="version",
                id="id",
                source="source",
                account="account",
                time=datetime.now(),
                region="region",
                resources=[],
                detail={"key": "value"},
            ).model_dump_json(),
            "attributes": {
                "ApproximateReceiveCount": "1",
                "SentTimestamp": "1523232000000",
                "SenderId": "123456789012",
                "ApproximateFirstReceiveTimestamp": "1523232000001",
            },
            "messageAttributes": {},
            "md5OfBody": "{{{md5_of_body}}}",
            "eventSource": "aws:sqs",
            "eventSourceARN": "arn:aws:sqs:us-east-1:123456789012:MyQueue",
            "awsRegion": "us-east-1",
        }
    ]
}

parse(event, model=EventBridgeModel, envelope=SqsEnvelope)

Possible Solution

This is being caused due to the RawDictOrModel type being a union of multiple types including type[BaseModel] which is causing the issue.

AnyInheritedModel = Union[Type[BaseModel], BaseModel]
RawDictOrModel = Union[Dict[str, Any], AnyInheritedModel]

e.g. when powertools attempts to pydantic to load the JSON (using model_validate_json). It tries to validate the field against type[BaseModel] which does not seem to be allowed in a JSON parsing context.

Either this part of the type hint needs to be removed or something needs to happen in the validation logic itself.

Steps to Reproduce

See the code snippets. This only happens when the detail field (or any field of the type RawDictOrModel) of the EventBridgeModel actually contains some data.

Powertools for AWS Lambda (Python) version

latest

AWS Lambda function runtime

3.11

Packaging format used

Lambda Layers

Debugging logs

No response

boring-cyborg[bot] commented 2 days ago

Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

UlfurOrn commented 2 days ago

If you need any more details please let me know 🙏

leandrodamascena commented 2 days ago

Hi @UlfurOrn! Thanks for reporting this issue! I see other people complaining about the same thing in Pydantic and with no effective solution. It really seems like validating JSON against a model that can has a type of another model is a limitation in Pydantic.

But talking about the Powertools side, I was able to reproduce the problem here and I can confirm this is creating some problems. Due to this seems to be a limitation in Pydantic I think we can put the validation code in a try/exception block and if this raises a NotImplementedError we try to validate with validate_python instead of validate_json. Something like this:

logger.debug("parsing event against model")
try:
    if isinstance(data, str):
        logger.debug("parsing event as string")
        return adapter.validate_json(data)

    return adapter.validate_python(data)
except NotImplementedError:
    logger.debug("fallback to validate using python")
    data = json.loads(data)
    return adapter.validate_python(data)

I still need to think in a more optimized code for this, but this seems to fix the Pydantic limitation.

Either this part of the type hint needs to be removed or something needs to happen in the validation logic itself.

Removing the type hint part is not a good solution because we can break customers that are relying in this possible type and using classes instead of instances of those classes. We never know how our customers are using this project and it is better to avoid breaking things.

I'd like to hear your thoughts.