aws-powertools / powertools-lambda-python

A developer toolkit to implement Serverless best practices and increase developer velocity.
https://docs.powertools.aws.dev/lambda/python/latest/
MIT No Attribution
2.84k stars 389 forks source link

RFC: Validate incoming and outgoing events utility #95

Closed heitorlessa closed 4 years ago

heitorlessa commented 4 years ago

Key information

Summary

This utility helps you validate incoming events from Lambda event sources as well as the response from your Lambda handler - All based on JSON Schemas. You can either validate an entire incoming event including the chosen event source wrapper, or only the event payload/body if you will.

Motivation

Well-Architected Serverless Lens recommends validating events under the Security pillar. As of now, customers have to implement their own validation mechanisms, bring additional dependencies, and end up crafting multiple JSON Schemas for popular event sources.

We could ease that by including in Powertools and lowering the bar for entry.

Proposal

This utility would use the already present Fast JSON Schema lib to validate both incoming and outgoing events using a preferred JSON Schema.

Another plus is that we can use our Middleware Factory to make this easier to implement, and automatically trace it with X-Ray to profile performance impact when necessary.

Validating both inbound/outbound

from aws_lambda_powertools.utilities import validator

@validator(inbound=inbound_schema_dict, outbound=outbound_schema_dict)
def lambda_handler(evt, ctx):
    ...

For customers wanting to validate only the payload of popular event sources, say API Gateway, this utility will work in tandem with an extractor utility - That will provide the following benefits:

By default, envelopes will pluck only the payload of a message within the event. Allowing multiple paths can easily add complexity, so we will defer to customers creating their own envelopes if they want to.

Validating inbound with built-in popular event source schemas

from aws_lambda_powertools.utilities import validator
from aws_lambda_powertools.utilities.extractor import envelopes

@validator(inbound=inbound_schema, envelope=envelopes.api_gateway_rest)
def lambda_handler(evt, ctx):
    ...

Drawbacks

Rationale and alternatives

Unresolved questions

Optional, stash area for topics that need further development e.g. TBD

nmoutschen commented 4 years ago

It'd be nice to have vended schemata for AWS Services, but question mark on their utilities. Would it be to filter when a Lambda function is triggered by an unexpected service? The schemata would have to be loose enough to accommodate future changes without breaking anything.

However, validating things like APIGW body, EventBridge details, etc. would be a must-have for me, but means having an understanding of these events and how to unpack them (e.g. json.load() the APIGW body).

heitorlessa commented 4 years ago

UPDATE: Updated the RFC based on our internal discussion on this.

Yep, I agree with that and the feedback we got on Twitter too.

As we have to create a correlation ID utility too, it makes more sense to:

If this sounds good I'll update the RFC to reflect these, and create a PR next week

Thoughts?

nmoutschen commented 4 years ago

We should also validate the event itself based on specific keys. I see a few examples that would benefit from that:

tmclaugh commented 4 years ago

To what @nmoutschen said above and after our chat with Heitor, Yes, validating both event and payload are valuable. What I'd like is a decorator for validation where I pass in and schema only to validate the event or pass in an additional envelope which would cause the event payload only to be validated.

randude commented 4 years ago

Hey guys, really liking your tools, it makes life a lot easier for us Python AWS developers. For validation I actually wrote a blog about how we solved that in our code: https://medium.com/cyberark-engineering/aws-lambda-event-validation-from-zero-to-hero-2ca950acd2ea We basically used Pydantic which offers excellent performance while being super readable. We use it also for validation and parsing boto3 responses and not just aws events or events between services. What do you think? @heitorlessa

heitorlessa commented 4 years ago

This is really interesting, though 8.2MB more!

I need to dig into their docs a bit more as they seem to offer another package to generate Pydantic Models from JSON Schema, which could be a great value add. Apart from package size, we use fastjsonschema which is really fast, so the benchmarks they compared against do not hold ground with the lib we use.

I believe the UX would change from the proposed design from the little I saw, how would you see the UX if we were to use pydantic here @randude?

ran-isenberg commented 4 years ago

Well, the user can either supply a json or a pydantic model. A pydantic model has advantages because you can define non json types like uuid, http url , datetimes, email addresses etc., and also define validator/root validators which offer logical validation of the relationships between the variables in the schema and not just the pure value check. So this is a big plus, that's why we used it. You can add custom values validation like aws region string, we used boto3 to validate it's a valid region input.

So I think that the user should be able to define pydantic schemas and then you can wrap the whole try/except with your decorator. I.e the input to the validator is a Pydantic class. You can also add support for SQS/SNS/eventbridge/DynamoDB stream messages where the original user message is wrapped inside an AWS event. I had to manually go and write the dynamoDB schema (and also maintain it), but if this utility can do this for me, that would be awesome. You can do this by making the model that you create like this: I think that the user will have to import the SQSEvent schema and inherit it class AWSEvent(BaseMode) .. [ has a message field which is the custom user data as a dict ] .. class UserSchema(BaseModel) ..

class SQSUserSchema(AWSEvent) "message": UserSchema

Another parameter for validation would need to be if it's string or dict, pydantic handles both but you need to call different functions when validating.

Last thought - it would be awesome if AWS had some central repo for schemas where you can share schemas between teams and also look at the current AWS event schemas. I know that there's https://docs.aws.amazon.com/eventbridge/latest/userguide/eventbridge-schemas.html but that's too specific for eventbridge.

BTW, Let me know if you need some help with coding this ;)

heitorlessa commented 4 years ago

Thanks for clarifying @randude - And yes, if you could spare some time working a PR to POC this I'd love any help I can get -- I have to work on minor fixes for Logger on docs and autocomplete for PyCharm in the meantime, then I'll be back to this :)

I can totally see this being more helpful long-term - I have two thoughts I need to think more carefully so recording for posterity:

ran-isenberg commented 4 years ago

@heitorlessa It's me again. This is my work account. I'll see what I can do. Hopefully i'll have some good news very soon. Do you have any slack/skype account that i can ping for questions regarding the repo if i need?

heitorlessa commented 4 years ago

@risenberg-cyberark Awesome - I'm on AWS Developer Slack channel [Heitor Lessa (AWS], feel free to DM me. If you're not, DM me on Twitter and I can invite you to Slack

heitorlessa commented 4 years ago

the initial implementation of this RFC (Simple JSON Schema validator) has now been merged #153 - It'll be available in the next release 1.6.0

to-mc commented 4 years ago

Closing this issue as we've released the validator utility with 1.6.0. See RFC #147 for progress on integrating with pydantic.