Open Vandita2020 opened 1 year ago
Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link
got dragged in meetings and will reply to you properly tomorrow. It'll be along these lines I posted on Metrics: https://github.com/awslabs/aws-lambda-powertools-python/issues/2015#issuecomment-1481541558
TracerProvider
is good, but format
will confuse customers.
Thank you @heitorlessa
TracerProvider
is good, butformat
will confuse customers.
I see how it can be a bit confusing to the customers. I have revised the RFC and tried to make it simple by removing the format
part while keeping the BaseProvider
as a parent class to CustomTraceProvider
.
I underestimated how much feedback I needed to write for Metrics RFC so I ran out of time (apologies!) - I've blocked time on Monday afternoon (morning PST) to go through this.
At a quick glance, the piece I'm missing in the contract is patching
- a custom provider will have to own the responsibility to patch one or all supported libraries.
hey @Vandita2020 that's a great start!! I took liberty to address some low hanging fruits and made a list of suggestions similar to what I made to Metrics.
There are some minor changes like using the span
terminology instead of X-Ray specific segment
to ease authoring custom providers. Two major ones are considering that we need a Span representation, and what strategies should we consider to support threading without leaking too much.
Changes
(e.g., Tracer(service=""))
Custom tracer usage
to Bring your own provider
Asks
provider
instead of format
(leftover)trace
context manager in the BaseProvider signature. This acts as a shortcut so customers can create/close spans within parts of their code instead of the entire function.capture_method
, capture_lambda_handler
capture_method_async
(something we should move towards given the complexity of having under a single methodsegment
to span
terminologies to ease integration with partners, e.g.: start_span
. We can then create a X-Ray Provider and use the equivalent methods/terminologies with the added benefit of demonstrating how other providers could do it.inject_context
.start_span
should return a Span. A Span can start another span (parent/child), end, receive annotations/metadata, etc. This part needs more research. We need to strike a balance of keeping operations simple since the provider does all the hard work of keeping tracing context, asyncio support, etc., and allowing extension due to Liskov substitution principle.tracer.start_segment()
, tracer.end_segment()
in the Bring your own provider section, I suspect you meant something else entirely.Hey @heitorlessa,
Thanks so much for the review. I have made the changes in the RFC accordingly, also for couple of comments, I have provided come comments/context below.
Add
capture_method_async
(something we should move towards given the complexity of having under a single method
Currently async methods are handled using @tracer.capture_method
, which uses couple of if-else conditions to check if it is sync or async. To simplify the execution of async methods, we create a new method to specifically handle async methods.
Review the use of tracer.start_segment(), tracer.end_segment() in the Bring your own provider section, I suspect you meant something else entirely.
I have corrected it, earlier I was using tracer.start_segment()
to start the tracing for everything and tracer.end_segment()
to end the tracing, but it got more clearer to me now. As now we are using capture_lambda_handler
to trace the handler, capture_method
to trace any method, and start_span
end_span
to trace any particular span.
Add a section on threading (e.g., what does the Base provider need to implement to support it?)
For threading, the concept that I used is that the BaseProvider class needs to implement methods to create and manage thread-local storage for each trace. When a new child thread is created, the trace context can be copied to the new thread-local storage, allowing the new thread to continue the trace without interfering with the parent thread’s trace. When the thread finishes, the trace context can be removed from the thread-local storage. I've implemented context manager for it now, however still need to provide an example of how to use it. But before I need to make sure if this sounds a good way to support threading?
Great!! As for threading, before you dive in the implementation per se, take a look at how DataDog, NewRelic, and OpenTelemetry Tracers handle threading + asyncio.
Reason I ask that is that we wouldn't necessarily need to implement threadlocal or contextvars, because the Provider would be a wrapper on top of the actual implementation (e.g., DataDog Tracer, OpenTelemetry Tracer, etc.).
What's missing in the RFC is a section with a comparison on how observability providers out there handle threading. Then call out whether we need additional public methods in the BaseProvider that will be implemented by the actual provider who already handle the threadlocal or contextvars.
As this is a complex topic, to recap, our Tracer Observability Provider is a thin wrapper on top of the actual Observability Provider SDK (e.g., DataDog SDK, OpenTelemetry SDK, New Relic SDK, etc.). This helps customers use the same Powertools DX across Providers, and each provider simply implement our interface while bringing their various flavours of SDK already handling this use case.
Please do let me know if I can make this any clearer.
Tks a lot!!
Hi @heitorlessa! I'm taking over this issue and trying to catch up on the current process. From what I perceive the current pending item is investigate "How OPTL, new relic,DD handle threading in tracing? - related to issue 2047". Please help me to confirm and remind me what else should I take a look. Thank you!
Yup, that’s correct!
On Fri, 5 May 2023 at 21:58, Roger Zhang @.***> wrote:
Hi @heitorlessa https://github.com/heitorlessa! I'm taking over this issue and trying to catch up on the current process. From what I perceive the current pending item is investigate "How OPTL, new relic,DD handle threading in tracing? - related to issue 2047". Please help me to confirm and remind me what else should I take a look.
— Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-lambda-powertools-python/issues/2030#issuecomment-1536712307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZPQBDB3HVQ5X3ARWIJGXTXEVLVPANCNFSM6AAAAAAWDDLERQ . You are receiving this because you were mentioned.Message ID: @.***>
Hello everyone! I'm updating this issue with some decisions we made regard the Tracer provider
We decide the follow the naming convention of OpenTelemetry tracing. Comparing to the current provider:
BaseSegment -> BaseSpan |
current | proposed |
---|---|---|
close | removed | |
add_subsegment | removed | |
remove_subsegment | removed | |
put_annotation | set_attribute | |
put_metadata | set_attribute | |
record_exception | add_exception |
BaseProvider |
current | proposed |
---|---|---|
in_subsegment | trace | |
in_subsegment_async | trace_async | |
put_annotation | set_attribute | |
put_metadata | set_attribute |
set_attribute
The current proposed signature https://github.com/aws-powertools/powertools-lambda-python/blob/ee869b3d1890ab52a2120a08b713f9a1afeef83b/aws_lambda_powertools/tracing/provider/base.py#L10-L14
We decided to accept Any
value here, but the actual supported data type will be decided in the specific provider
BaseSegment
and BaseProvider
These two classes will remain for backwards compatibility until Powertools V3
Subsegment.close
:We come to the conclusion that using float in epoch seconds in this case is more appropriate. I also created a https://github.com/aws/aws-xray-sdk-python/issues/424 for this in X-Ray python SDK as I believe the docstring here is referencing the X-Ray Python SDK.
We support all Backwards compatibility except escape hatch usage directly on X-Ray recorder. For example:
tracer = Tracer()
tracer.provider.capture('subsegment_name')
def myfunc():
# Do something here
We didn't have capture
function in the current BaseProvider
thus this behavior will not be supported in the new provider. For existing function in the current BaseProvider
like in_subsegment
, they will still be supported.
Is this related to an existing feature request or issue?
Issue: #1433 Logger RFC: #2014 Metrics RFC: #2015
Which AWS Lambda Powertools utility does this relate to?
Tracer
Summary
This RFC is one of the three that defines the format when setting up loggers, metrics and traces for better integration with other observability providers.
This RFC is specifically for the Tracer. Currently, we have undocumented BaseProvider for Tracer, but we need to decide more on what minimum features the
BaseProvider
should support. The RFC discusses on the features that could be a part of custom tracer for users to integrate other Observability providers easily.Use case
The use case for this utility would be for developers who want to use other observability providers to trace their application, other than AWS X-Ray.
Proposal
Current tracer experience
The Powertools’ tracer utility is essentially a wrapper for the AWS X-Ray SDK. Some key features of this utility include auto capturing cold start as annotation, auto capturing responses or full exceptions as metadata, and auto-disabling when not running in AWS Lambda environment. Tracer also auto patches supported modules by AWS X-Ray.
JSON output
```json { "trace_id": "1-5e367daf-6c7f6d9f6c3a6e5800c7d42d", "id": "e986a861d4590d97", "name": "payment", "start_time": 1580441546.023, "end_time": 1580441552.983, "http": { "request": { "method": "GET", "url": "https://api.example.com/", "client_ip": "192.168.1.1", "user_agent": "Mozilla/5.0", }, "response": { "status": 200, "content_length": 1024, "headers": { "Content-Type": "application/json" } } }, "subsegments": [ { "id": "3b3b3d8ba74fa7fe", "name": "my-subsegment", "start_time": 1580441548.023, "end_time": 1580441551.983, "http": { "request": { "method": "POST", "url": "https://api.example.com/submit", "headers": { "Content-Type": "application/json", "Authorization": "Bearer abc123" }, "body": "{\"data\": \"example\"}" }, "response": { "status": 200, "content_length": 128, "headers": { "Content-Type": "application/json" } } }, "annotations": { "example": "annotation" } } ], "annotations": { "example": "annotation" }, "metadata": { "example": "metadata" } } ```Tracer proposal
We propose a new parameter to the existing tracer utility that developers can use to specify which observability provider they would like their traces to be pushed to. The below code snippet is a rudimentary look at how this utility can be used and how it will function. Out of the box, we will support DataDog. Other providers TBD
JSON output
```json { "trace_id": "3541457326329954564", "span_id": "467508042476235233", "parent_id": "3541457326329954564", "name": "payment", "resource": "GET /api", "start": 1647370203.4475, "duration": 0.0325, "service": "serverlessAirline", "type": "web", "meta": { "http": { "method": "GET", "url": "http://localhost:8000/api", "status_code": 200 } } } ```Bring your own provider
If you would like to use an observability provider not supported out of the box, or define their own tracer functions, we will define an interface that the customer can implement and pass in to the Tracer class.
Example
The five methods defined above are a combination of methods that already exist in the
BaseProvider
and the ones that are most common in other observability providers.The current
BaseProvider
does support most of the features used in the major observability providers. There are couple of differences I noticed while researching through the other Observability providers.Out of scope
Sending traces from Powertools to the customer's desired observability platform will not be in the scope of this project. The implementation should only support modifying the output of the Tracer so that the customer can push them to their platform of choice.
Potential challenges
We need to determine which platforms we want to support out-of-the-box (apart from Datadog).
Dependencies and Integrations
We will have to integrate with (and thus, have a dependency on) Datadog and any other platforms we decide to support out-of-the-box.
Alternative solutions
No response
Acknowledgment