Closed LouisAmon closed 2 years ago
Thanks for opening your first issue here! We'll come back to you as soon as we can.
Thanks for opening your first issue here! We'll come back to you as soon as we can.
Hey @LouisAmon thanks for raising this. We have created awslabs/aws-lambda-powertools-python#346 to rewrite the entire documentation on dataclass utility - naming is hard. This next release is 80% dedicated to that doc.
This Friday, I'll start revamping that doc and lay down ambitions for v2 - that includes changing dataclasses
to
If you have ideas for naming I'd love to hear them so we can improve everyone's experience. If we could have a better name by the time the new docs are completely rewritten, then I could easily create code aliases and redirect docs altogether
Thank you!
Well to be honest I think dataclasses
are indeed a best practice in Python, especially when it comes to data modeling and typing
May I ask why the project didn't go into actually using the @dataclass
decorator ?
I guess now that a huge amount of code is in place, you can't just add @dataclass
on top of DictWrapper
... but still, have you tried ?
I know some projects have encountered limitations with the way Python's standard dataclasses
are meant to work. For example Pydantic ended up defining their BaseModel
and their own @dataclass
decorator, but overall they tried to stay consistent with the Python API
Maybe the right way for AWS Lambda Powertools would be to piggyback on Pydantic when it comes to validation (using their @dataclass
), or using the regular Python decorator for "normal" data modelling
Another solution would be to define your own @dataclass
, but then also make sure that the regular APIs (such as asdict()
) are available
TBH I think Pydantic adds a great deal of value to Lambda payload processing, although it is a bit heavy as a dependency :)
@LouisAmon i added this original as a helper for the event classes, and over time it started to include more features like handling decoding and then decorators for specific use cases.
The objective was to work with Python 3.6 and not have any additional dependencies. So @dataclasses
is not available in Python 3.6 without a backport. I am sure once we deprecate Python 3.6 support, we can strip down the code alot.
So we could create a lightweight version of @dataclass
with asdict()
, for now there is raw_event
that you can use if you want the dict.
We (@heitorlessa and @cakepietoast) did discuss the naming of this in the original PR (https://github.com/awslabs/aws-lambda-powertools-python/pull/159#issuecomment-694342213). I am not sure any name would have worked :). But we tried our best.
Maybe a warning note or something docs could help reduce any potential confusion, and then adding asdict()
method, could be another alias of raw_event
property.
Otherwise i am pretty proud of how many people this has helped. The next features that are related to this is some common Exception types needed for API Gateway Proxy events, and then better integration with the Idempotent decorator.
We (@heitorlessa and @cakepietoast) did discuss the naming of this in the original PR (#159 (comment)). I am not sure any name would have worked :). But we tried our best.
Don't get me wrong: you guys did an outstanding job overall, thanks for that !
Maybe a warning note or something docs could help reduce any potential confusion, and then adding asdict() method, could be another alias of raw_event property.
Maybe, but it feels to me like a half-measure : either you're working with a dataclass
with the full API or you're simply working on a data model
(which is very fine, many SQLA / Django users are familiar with that term and what it means)
For example instead of:
from aws_lambda_powertools.utilities.data_classes import event_source, APIGatewayProxyEventV2
@event_source(data_class=APIGatewayProxyEventV2)
def lambda_handler(event: APIGatewayProxyEventV2, context):
if "helloworld" in event.path and event.http_method == "POST":
do_something_with(event.json_body, event.query_string_parameters)
I'd rather see:
from aws_lambda_powertools.utilities.models import event_source, APIGatewayProxyEventV2
@event_source(model=APIGatewayProxyEventV2)
def lambda_handler(event: APIGatewayProxyEventV2, context):
if "helloworld" in event.path and event.http_method == "POST":
do_something_with(event.json_body, event.query_string_parameters)
That's my opinion anyway :)
Overall, if your concern is to stick with concepts that exist in Python 3.6 then maybe it's best to avoid the dataclasses
terminology altogether no ?
My personal suggestion would be to be technically compatible with Python 3.6, yet bring them one step into the future (so to speak) by using a dataclasses backport
With the backport in place, you can also use the dataclasses
in Pydantic, and stick to an all-dataclass terminology and API throughout the project
It would make even more sense to make that change regarding Envelopes
and the parser
feature : the base model could be defined as a dataclass and the envelope would be a dataclass as well !
Having a base model defined as a data class allows you to do cool stuff like generating fake data on the fly or defining a default_factory
, in a way that's consistent with other open source libraries
My two cent on the matter 😄
We (@heitorlessa and @cakepietoast) did discuss the naming of this in the original PR (#159 (comment)). I am not sure any name would have worked :). But we tried our best.
Don't get me wrong: you guys did an outstanding job overall, thanks for that !
Maybe a warning note or something docs could help reduce any potential confusion, and then adding asdict() method, could be another alias of raw_event property.
Maybe, but it feels to me like a half-measure : either you're working with a
dataclass
with the full API or you're simply working on a datamodel
(which is very fine, many SQLA / Django users are familiar with that term and what it means)For example instead of:
from aws_lambda_powertools.utilities.data_classes import event_source, APIGatewayProxyEventV2 @event_source(data_class=APIGatewayProxyEventV2) def lambda_handler(event: APIGatewayProxyEventV2, context): if "helloworld" in event.path and event.http_method == "POST": do_something_with(event.json_body, event.query_string_parameters)
I'd rather see:
from aws_lambda_powertools.utilities.models import event_source, APIGatewayProxyEventV2 @event_source(model=APIGatewayProxyEventV2) def lambda_handler(event: APIGatewayProxyEventV2, context): if "helloworld" in event.path and event.http_method == "POST": do_something_with(event.json_body, event.query_string_parameters)
That's my opinion anyway :)
Which part is different? If we make changes now it would have breaking changes for existing people.
dataclasses
Yep, it was originally event sources
, and it could be models
, but it is data_classes
so not exactly dataclasses
;-)
It is quick too refactor, but it would involve breaking changes for all the library users.
So maybe for V2 when Python 3.6 is deprecated, and whatever this is call moves up a level.
So note that this was my first python contribution coming from a Kotlin/Java background, so forgive the lack of pythonic terms used.
note, as a contributor and user of the library, I do like that this feature does not introduce any additional dependencies or slows down the lambda cold starts.
@heitorlessa - I would propose we switch back to what I had originally suggested as event_sources
and frame it as classes, utilities and helpers for the AWS standard events. And we can target this for V2 (and when Python 3.6 lambdas are deprecated), and then we can potentially minimize the amount of code.
I agree with you both that making a breaking change just for syntactic changes is not worth it.
I would also propose to make the big change in v2, but at that point : why not consider actually using Python dataclasses ? I mean, since 3.6 is gonna be dropped and all...
By the way, you can even keep the 3.6 support with the backport... and keep the old syntax with a DeprecationWarning
for a few versions... :)
Shall we close the ticket ? I realise I should've opened a discussion thread instead 🙏
@LouisAmon this library is supposed to run on Python 3.6 and higher, but you are welcome to make a PR with your proposed changes
@LouisAmon - Thank you for your strong opinions and suggestions, we appreciate your enthusiasm to help us make everyone's experience better.
I'd like to clarify a few things as I turn this issue into a feature request and move forward with other higher priorities tasks.
Dataclasses not being added in day one. Back then, VSCode and PyCharm didn't have intellisense support for them resulting in either false positives or no autocompletion at all. This led us to consider two routes: a) attrs as that's been the de-facto for class boilerplate and handy data methods since 2015 including support for 3.6+ with all dataclass features with a more rapid release cycle, or b) dataclasses backport for 3.6 only, though we had tiny concerns like it was always installing for 3.6+ and not 3.6 only, and whether additional features and types would've been backported to 3.6 on a release cycle.
attrs seemed the perfect candidate considering release cycle but we hit the same problem with autocompletion, and VSCode Pylance was not GA at the time either. We decided not to risk it, keep the original logic @michaelbrewer did using @property
as that worked for all IDEs on autocompletion with a helpful text to fields, and deal with the naming per se later as we'd figure a better name given the handy methods it has.
Using pydantic instead. As you mentioned, Pydantic adds at minimum ~35M using the latest CFlag optimizations with pip. When using SAM CLI however, it PythonPipBuilder doesn't support these new optimizations leading to an additional 91M vs 35M, as it pulls wheels and binaries for Linux and doesn't support the env vars in pip discussed in the thread. For this reason, we decided not to use pydantic however easier it'd have been to maintained it'd directly incur additional charges for customers and not aligning with our tenets.
This in itself brings us to the major decision factor in v2 on whether to use implicit namespace packages (PEP 420) so you could install each utility separately along with their own dependencies, or make all dependencies optional since this whole package is ~1.2M. The latter is another discussion I need to write into a new issue to hear from the community before we cut v2, since Boto alone went from 35M to 64M in an year (~10M increase in the past 2 months alone).
I'm turning this issue into a feature request and adjusting the title so the language is more accommodating to what this community represents. These are the changes we'd be comfortable to make or welcome PRs at this stage to not having to wait until v2 as there are bigger priorities like the boto package size above:
asdict()
method in DictWrapper to make it more intuitiveasdict()
for convenienceevent_sources
and reflect our documentation to point to that instead of data classes as they've outgrown their initial purpose. This will keep both running to not break existing customers but only one emphasized in the documentation w/ a note that data classes
will be deprecated in v2 along with a place for them to track.I hope this helps and I'll concentrate on recreating the documentation for this utility as initially planned to start today.
Thank you all
Good suggestions @heitorlessa.
Just as a note, raw_event
is just returning the wrapped dictionary object (self._data
), so when changing this to asdict()
method people may think we are generating a new dictionary object? Also having both could be confusing too.
Another note about the “event_sources” implementation, if it was implemented as just a wrapper to the original event dict, making it as light weight as possible. So a very large dict that only needs to have a single key look up, does not need to perform a full parse, i would like to keep this behavior.
@heitorlessa
No much will happen on this issue until Python 3.6 and whether dataclasses actually delivers the savings we expect.
I would just propose a rename
Yeah we discussed of renaming it earlier either way
On Fri, 12 Nov 2021 at 19:17, Michael Brewer @.***> wrote:
@heitorlessa https://github.com/heitorlessa
No much will happen on this issue until Python 3.6 and whether dataclasses actually delivers the savings we expect.
I would just propose a rename
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-lambda-powertools-python/issues/481#issuecomment-967320433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZPQBF3E7FO3ZH6IUJGNUDULVK4NANCNFSM46VLHF2A .
Closing as we will only be able to rename it completely in v3. Data classes confirmed to be too slow. We're following Pydantic v2 closely for a faster and closer API model. As there isn't a date, raw_event
property does the job.
This issue is now closed. Please be mindful that future comments are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
What were you initially searching for in the docs?
I'm using the @event_source decorator and I was looking for a way to print my event in my Cloudwatch logs.
When I simply
print(event)
, I just get a class instance, but I want to be able to see the actual keys and values :)Being used to working with Python dataclasses, I naturally tried asdict:
Or even...
...and I was very surprised by the result:
Is this related to an existing part of the documentation? Please share a link
https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/data_classes/
Describe how we could make it clearer
Dataclasses are meant to be a Pythonic way to define a data model.
If you define your own class (
DictWrapper
in your case) which inherits from nothing then all the APIs and principles behind dataclasses are lost.Dataclasses are used in libraries such as sqlalchemy or strawberry to define a unified way to communicate with a database or build a GraphQL API... dataclasses provides unity between all these, so why not AWS Lambda ? :)
If you have a proposed update, please share it here
I think we should either revamp the code to actually define proper data classes, or stop calling it as such : it's just confusing.
What do you think ?