RFC: Support thread-local keys for logger

Key information

RFC PR:
Related issue(s), if known:
Area: Logger
Meet tenets: Yes
Approved by:
Reviewed by:

Summary

Provide the ability to append thread-local keys to a logger which only get included in log records logged in the thread they got added to.

Motivation

As AWS Lambda provides multiple CPU cores it makes sense for certain workloads to distribute work across multiple threads. Currently it's not possible to append keys to a logger on a per-thread basis. That however would be very helpful for logging thread-specific information.

structlog, another popular Python logger offering structured logging, supports such a feature under the term "thread-local contexts".

Proposal

In addition to manage globally scoped keys, Logger needs an interface to manage thread-local keys. This interface must consist of the ability to append keys, remove keys and clear all thread-local keys of the current thread. The ability to clear all keys of the current thread is helpful when using thread pools where threads are being reused, but shouldn't retain log keys of previous usages.

I can imagine two ways how this could work (but I'd be happy with another way as well): Additional methods for Logger for managing thread-local keys or the ability to get a "thread-aware" child logger which can then be used to configure thread-local keys.

Proposal 1: Additional methods

Using additional methods for the existing Logger class would require to somehow indicate that they relate to the current thread. So names like append_thread_local_keys/remove_thread_local_keys/clear_thread_local_keys could be used. Here is an example how that could look like:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")

def process_payments(payment_id):
    logger.clear_thread_local_keys()
    logger.append_thread_local_keys(payment_id=payment_id)
    logger.info("Start processing payment")
    ...
    logger.info("End processing payment")

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

Proposal 2: Child logger

Offering the ability to get a thread-local child logger from an existing logger would offer usage like that:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")

def process_payments(payment_id):
    child_logger = logger.get_thread_local_logger()
    child_logger.append_keys(payment_id=payment_id)
    child_logger.info("Start processing payment")
    ...
    child_logger.info("End processing payment")

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

For performance reasons when re-using threads it might be worth to consider to not create a new child logger for every call to Logger.get_thread_local_logger(), but instead to just to the clear keys of a possibly already existing child logger for the same thread.

One drawback of this proposal might be that it'd be cumbersome to use when the logic running in a thread is more complex and spans multiple functions, as the logger instance would need to be passed between these functions.

If this feature should be available in other runtimes (e.g. Java, Typescript), how would this look like to ensure consistency?

IMO it shouldn't be a problem to port that to the other runtimes in the same way as well.

User Experience

How would customers use it?

Without the need for thread-local keys, customers would us AWS Lambda Powertools' Logger the same way as they do right now. With the need for thread-local keys, customers would call the respective methods to enhance records logged in a thread with additional useful information. A code example for that is shown above.

Any configuration or corner cases you'd expect?

There might be different expectations regarding the behavior when globally scoped keys get appended while a thread is running and its logger configuration has been already done. Will such global keys be included in records logged by the thread immediately or only after the logger configuration for the thread got re-initialized? I personally would prefer global keys to get included immediately into records logged by threads as well.

Demonstration of before and after on how the experience will be better

Before:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")

def process_payments(payment_id):
    logger.info("Start processing payment", extra={"payment_id": payment_id})
    ...
    logger.info("End processing payment", extra={"payment_id": payment_id})

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

After:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")

def process_payments(payment_id):
    logger.append_thread_local_keys(payment_id=payment_id)
    logger.info("Start processing payment")
    ...
    logger.info("End processing payment")

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

Drawbacks

The only drawbacks I can imagine right now are a more complex Logger interface and a possible performance impact when logging records, depending on the resulting implementation.

Rationale and alternatives

What other designs have been considered? Why not them?

The only other option I did consider was manually creating one logger per thread, but that feels cumbersome, because aside from the thread-local keys, it would need to be configured identical to the logger for the main thread.

What is the impact of not doing this?

Customers who wish to log thread-local keys have no straight-forward way available to do so and either have to employ workarounds like using the extra keyword to add the desired keys for each record or to use another logger.

Unresolved questions

Wow, what a great write up @Dunedan !! Thank you! I really like the first proposal - this makes it explicit whoever is reading the code, it doesn't incur into an additional unbounded logger, etc.

I have a few questions:

Would you be able to contribute so we fast track this?
- I can help with anything you might need e.g. docs, review, additional code, prioritize it for the next release, etc.
For thread safety mutations, are you thinking of using thread local storage or something else?
Should additional keys added outside a thread also be logged within a thread?
- It's the same logger and formatter. For example, logger.append_keys(order_id=1234) runs after thread started, once a log statement (logger.info("something") happens within the thread it'll naturally have order_id=1234
Do we need to modify log statements method to use a different log dictionary state? e.g. logger.info, logger.warning
- Challenge with this is location key as Python 3.6-3.7 logging didn't allow helper functions/methods to correctly identify the caller (actual line number for log.info statement).
- We could backport stackLevel param which also enables us to have **kwargs for loggers without having to resort to extra (solves a long papercut).
- Python 3.6 is now deprecated, so we could consider 3.8+ only too for this feature.
Shall we include thread_id by default as part of the log attributes too?

Internals

For Logger, we keep state within the Formatter under self.log_format. Append/Remove keys are simply mutating that dict. A child logger doesn't have a formatter registered so it propagates to the parent allowing our bi-directional UX, hence some of the questions.

When a log statement occurs, Python logging calls our custom format so we can extract log keys, remove null keys, get latest X-Ray ID, etc..

Super looking forward to this.

Would you be able to contribute so we fast track this?

Sorry, I'm afraid don't have enough time for that right now. I'd be happy to provide feedback though.

For thread safety mutations, are you thinking of using thread local storage or something else?

Thread local storage sounds good. That's also how structlog handles it: https://github.com/hynek/structlog/blob/30736d643c64308d3d00953c22a8f3d1e4e39f1f/src/structlog/threadlocal.py#L176

Should additional keys added outside a thread also be logged within a thread?

It's the same logger and formatter. For example, logger.append_keys(order_id=1234) runs after thread started, once a log statement (logger.info("something") happens within the thread it'll naturally have order_id=1234

For me that'd fit my expectation: Whenever I add something to the global context, it'll appear in log records afterwards, no matter if the record was logged by the main thread or any other thread.

Shall we include thread_id by default as part of the log attributes too?

Usually I'd say yes please, but there is one argument against adding additional keys by default to the logs: CloudWatch ingestion cost. We're already thinking twice about what to add to CloudWatch logs, simply because it's already one of the biggest line items on our AWS bill.

Update: haven't forgotten this, but will likely to tackle only next year due to re:Invent and PTOs. We've recently added support for kwargs in all log statements that should make this problem slightly easier: https://awslabs.github.io/aws-lambda-powertools-python/2.2.0/core/logger/#ephemeral-metadata

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()

def handler(event: dict, context: LambdaContext) -> str:
    logger.info("Collecting payment", request_id="1123")

    return "hello world"

Hi @heitorlessa,

I'd be happy to work on the implementation of this issue if you haven't already started.

Hi @heitorlessa,

I'd be happy to work on the implementation of this issue if you haven't already started.

I somehow missed this important message :/ Absolutely!! I haven't started and I'd love help here. ContextVars would highly likely be the solution here - we couldn't do that before as we supported 3.6.

thank you @shanab !!

Hey @heitorlessa, I want to contribute to this RFC. I got the implementation working locally and the PR should be ready in the next day or two. I just need to update some test cases and documentation. I will post the PR once it is ready.

aws-powertools / powertools-lambda-python