Open VladimirGorobetsWildix opened 4 days ago
Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #typescript channel on our Powertools for AWS Lambda Discord: Invite link
Hi, thank you for opening this issue.
I believe this is intended, according to our documentation here (emphasis mine):
Pass one or multiple objects containing arbitrary data to be logged. Each data object should be placed in an enclosing object as a single property value, you can name this property as you need:
{ myData: arbitraryObjectToLog }
To avoid the behavior you're seeing, the data
parameter should be within an object:
logger.info('very important log key', { data });
which I confirmed produces:
{
"level": "INFO",
"message": "very important log key",
"sampling_rate": 0,
"service": "production-server",
"timestamp": "2024-10-17T15:14:59.825Z",
"data": {
"timestamp": "one year ago",
"name": "user",
"message": "message from chat",
"level": "poweredUser",
"service": "internal chat"
}
}
This is necessary because by default we merge each parameter with the JSON object of the log, so if your parameters have objects with duplicate keys they'll overwrite existing keys.
I understand how the current wording doesn't explicitly state this, and I am sorry to hear it led to data loss. We'll consider improving the section, perhaps with a more explicit disclaimer.
A warning in the documentation is very good, but trying to fix potential developer errors would be great.
This is very unobvious behavior because it is not limited in any way with the help of TypeScript, and the developer constantly needs to remember this feature of the logger.
Yesterday, one of our developers spent a lot of time because of this feature of the logger.
It is very difficult for a developer who is using your logger for the first time to understand why this can be done
logger.info('very important log key', {info: data});
but that's not possible
logger.info('very important log key', {message: data});
In our case, the developer followed your recommendation completely, but got unexpected behavior.
const message = await getMessageById(id);
logger.info('received message', {message});
In our case, the developer followed your recommendation completely, but got unexpected behavior.
const message = await getMessageById(id); logger.info('received message', {message});
This case is actually the next bullet point in that same section I linked above, and the fourth case in the code example right below it.
Either way, I see where the confusion comes from and there is room for improvement.
I don't think aliasing the fields is the right choice here, but I also don't have a better idea yet.
Please allow me some time to discuss it with the team, and see what comes up.
Hey @dreamorosi and @VladimirGorobetsWildix! I would like to share my impressions about this topic.
In my opinion we should first distinguish between events fields and mandatory/default fields. These mandatory fields serve a purpose: they keep things organized and follow specific rules/protocol/rfc in that programming language. Let's look at the OTEL Data Model as an example. It has fields that must follow a certain pattern. These fields are part of how the Logger Data Model/Feature, not part of the actual event being logged.
I also think that we need to look at this from the perspective of the tools that will retain/consume those logs. Let's imagine we ignore the RFC/Protocol and allow customers to override mandatory fields with any values they want. This could lead to several problems:
This could become a significant issue for everyone, as they may no longer trust their logs due to a lack of reliability.
On the other hand, events often have custom fields that customers want to log and query later. This is a valid and common use case. However, if a customer wants to log the entire event with all its fields, regardless of what they are, this information should be included in the msg/message
field rather than as separate arbitrary fields.
In Powertools Python you can do that using this code:
from aws_lambda_powertools import Logger
logger = Logger()
event = {
"timestamp": 'one year ago',
"name": 'user',
"message": 'message from chat',
"level": 'admin',
"service": 'internal chat',
}
logger.info(msg=event)
And then you'll have this output that can be used to query the information the customer needs:
{"level":"INFO","location":"<module>:12","message":{"timestamp":"one year ago","name":"user","message":"message from chat","level":"admin","service":"internal chat"},"timestamp":"2024-10-17 19:18:40,459+0100","service":"service_undefined"}
Regarding to the possible solution:
If duplicate fields occur, a prefix should be added for fields from additional objects.
For example message -> _message
I don't know if I would go down this path, we are changing the customer's information and this can create some questions and unwanted side effects.
The solution I think of at the moment is that if we find any extra field that is a keyword, we issue a warning. I think this is good because of the following reasons:
1 - We do not change customer information 2 - We improve the feedback loop by informing the developer that something is wrong and need to be fixed 3 - We do not raise any exceptions, which in observability is an excellent path to follow.
What do you all think?
Thanks
I think the warning, as well as clearer documentation, would make the most sense.
I agree that we shouldn't overwrite nor rename any field.
Thanks @leandrodamascena
I also think that we need to look at this from the perspective of the tools that will retain/consume those logs. Let's imagine we ignore the RFC/Protocol and allow customers to override mandatory fields with any values they want.
This is a very good point. Such non-obvious behavior can break the logic of log processing and we can lose an important part of the data.
Still, I am of the opinion that secondary parameters have no right to change the values of system fields.
Expected Behavior
System fields should not be replaced by values from additional parameters. If duplicate fields occur, a prefix should be added for fields from additional objects.
For example
message -> _message
Current Behavior
Received result
First log is correct, but in the second log all system fields have been based on the fields of the second object.
This is a very non-obvious behavior that results in the loss of important logs and many hours of debugging and understanding the cause.
Code snippet
Steps to Reproduce
Possible Solution
If duplicate fields occur, a prefix should be added for fields from additional objects.
For example message -> _message
Powertools for AWS Lambda (TypeScript) version
latest
AWS Lambda function runtime
20.x
Packaging format used
npm
Execution logs
{"level":"INFO","message":"test message","sampling_rate":0,"service":"production-server","timestamp":"2024-10-17T14:40:49.209Z","user":"admin"}
{"level":"poweredUser","message":"message from chat","sampling_rate":0,"service":"internal chat","timestamp":"one year ago","name":"user"}