aws / aws-sdk-go

AWS SDK for the Go programming language.
http://aws.amazon.com/sdk-for-go/
Apache License 2.0
8.61k stars 2.07k forks source link

Integration between AWS X-Ray and the AWS SDK for Go #1177

Closed dlsniper closed 7 years ago

dlsniper commented 7 years ago

Hello,

As a follow-up for #972, I would like to open this issue to request / track the integration of X-Ray with various components of the SDK. What I would like to see from this would be the following:

This should be possible now that the version 1.8.0 of the SDK was released and it adds Context support.

Finally, I believe that the integration should basically work like it currently does for Java / other supported languages, where a user just need to flag up the usage of X-Ray in order to determine the ability to use it.

Thank you.

jasdel commented 7 years ago

Thanks for creating this. One thing to consider with tracing API operations is how should the user identify the sequence API operations are made in, and how they are related. I'm not sure the SDK can go down the Java path of enabling a flag and getting support for sequence tracing.

Java SDK is able to do this because of thread local storage ensures sequence tracing is straightforward. In Go we don't have that option. The nearest comparison would be context.Context. If a user were to manage which context value is given to an API operation method they could achieve a similar sequence tracking. I think this might be good helper functions which create sub-contexts, e.g setting some context Value to identify the sequence.

Context needs to be used for sequence tracking, because there is no guaranteed association between a service client instance and a different sequences of API operation calls. By this I mean that many API operations may be called from the same service client, but there may not be any direction association between the operations to each other. Since service clients are thread safe and it is encouraged to share service clients as much as possible to reduce overhead. Separate service clients are only needed for differing config such as credentials and region.

dlsniper commented 7 years ago

I think one option that would allow the SDK to behave as close as possible to a "Go way" of doing this would be to have a middleware function in the SDK that the users can use in order to allow the SDK to control the values from the Context.

Then, by overwriting the context of the request and calling the next step in the chain, you would require the users to pass that context or a descended of it.

The SDK could also provide a function to give the user of the context back the value of trace header so that the it can be used in places like custom tracing in "userland", calls to other services, etc.

If the users don't want to use the custom middleware then it should be their responsibility to make sure the corect value are sent in the context at the time of the call (the SDK could provide a helper method to inject the X-Ray trace in the context maybe.

What do you think?

jasdel commented 7 years ago

Thanks for the update and clarification. I think this is onto the right path using the a context value to keep track of soubriquet requests.

With that said I don't think we'd include an X-Ray extension directly in the SDK. This would be great feature built on top of the SDK. By taking advantage of the SDK's request handlers and context a x-ray extension on top of the SDK could be built without needing internal SDK functionality. For comparison the AWS SDK for Javacript's Node.JS X-Ray feature is a separate module that injects into the AWS SDK for Javascript. Because of this lets close this issue as the feature wouldn't be implemented directly within the SDK.

dlsniper commented 7 years ago

I think this needs to be reopened.

I understand the need to have this in a separate repo, and I agree with it, but as it is right now, the SDK is unusable.

I've created this repository https://github.com/dlsniper/rontgen/blob/master/rontgen.go as we've previously discussed, but it turns out that the SDK does not have a clear way to provide any of the elements needed. I also cannot find a way provided by the SDK to send the data to the local running daemon, which the documentation about X-Ray clearly makes it as a desirable use-case.

I'm happy to talk with the X-Ray team / you (as SDK maintainers) to see how this could be solved, but I think that all of this should be rather part of the SDK not a separate SDK for Go and X-Ray. That way the current SDK could be instrumented and also provide wrappers for http handlers as well as http clients that can be instrumented with X-Ray.

jasdel commented 7 years ago

Thanks for the feedback @dlsniper. I think we still want to keep this issue closed. An X-Ray agent/metric gathering util would be best outside of the SDK similar to the Node.js library. It will allow additional features to be added to the agent separate from the SDK and prevents additional dependencies being added to the SDK's core. With the SDK's Request Handler's functionality all metrics should be available as injected functionality.

With that said let us know if there are metric features are you running into that are not available via the SDK's request handler functionality. We want to make sure users are able to get at the metrics they need/want when using the SDK without the SDK being a road block.

tj commented 6 years ago

Is the SDK supposed to detect the parent trace and instrument itself automatically? The X-Ray docs made it sound like that was the case.

cheers

jasdel commented 6 years ago

Hi @tj the AWS SDK for Go does not perform any detection or active integration with the X-Ray SDK. The AWS X-Ray SDK for Go will inject the instrumentation into the AWS SDK for Go.

tj commented 6 years ago

thanks!

odeke-em commented 6 years ago

In regards to the issues with thread local storage for Java vs context.Context in Go, I have prototyped how to trace all the auto-generated clients here https://github.com/orijtech/aws-sdk-go/tree/opencensus-tracing

For a quick introduction, I work on OpenCensus and coincidentally this evening I looked at how perhaps this could be implemented. OpenCensus allows you to trace and monitor then export to X-Ray and to a bunch of other backends likes Stackdriver Tracing and Monitoring, Prometheus, SignalFX etc It is a vendor agnostic distribution of libraries for distributed tracing and monitoring, no maintenance burdens or distributed systems expertise needed.

1) OpenCensus-Go's tracing minimally involves starting and stopping a span

ctx, span := trace.StartSpan(parentCtx, nameOfSpan)
defer span.End()

2) Most of the code is generated, hence in the API generation templates we can do

$ git diff private/model/api/operation.go 
diff --git a/private/model/api/operation.go b/private/model/api/operation.go
index 59a716e..a20894d 100644
--- a/private/model/api/operation.go
+++ b/private/model/api/operation.go
@@ -172,6 +172,9 @@ func (c *{{ .API.StructName }}) {{ .ExportedName }}(` +
 func (c *{{ .API.StructName }}) {{ .ExportedName }}WithContext(` +
        `ctx aws.Context, input {{ .InputRef.GoType }}, opts ...request.Option) ` +
        `({{ .OutputRef.GoType }}, error) {
+        {{ if .API.EnableTracing }}ctx, span := trace.StartSpan(ctx,"aws/{{ .API.PackageName }}.(*{{ .API.StructName }}).{{ .ExportedName }}")
+        defer  span.End()
+        {{end}}
        req, out := c.{{ .ExportedName }}Request(input)
        req.SetContext(ctx)
        req.ApplyOptions(opts...)
  1. To ensure proper context propagation as well as tracing when HTTP requests to the AWS APIs are being made, we also need to modify request/request.go a little to start some more spans and add some annotations for concrete visualization

    
    diff --git a/aws/corehandlers/handlers.go b/aws/corehandlers/handlers.go
    index cfcddf3..8db9da3 100644
    --- a/aws/corehandlers/handlers.go
    +++ b/aws/corehandlers/handlers.go
    @@ -14,6 +14,8 @@ import (
    "github.com/aws/aws-sdk-go/aws/awserr"
    "github.com/aws/aws-sdk-go/aws/credentials"
    "github.com/aws/aws-sdk-go/aws/request"
    +
    +   "go.opencensus.io/plugin/ochttp"
    )
    
    // Interface for matching types which also have a Len method.
    @@ -117,7 +119,15 @@ var SendHandler = request.NamedHandler{
    }
    
    func sendFollowRedirects(r *request.Request) (*http.Response, error) {
    -   return r.Config.HTTPClient.Do(r.HTTPRequest)
    +   hc := r.Config.HTTPClient
    +   if hc == nil {
    +       hc = &http.Client{}
    +   }
    +   var rt http.RoundTripper = hc.Transport
    +   if _, ok := hc.Transport.(*ochttp.Transport); !ok {
    +       rt = &ochttp.Transport{Base: hc.Transport}
    +   }
    +   return rt.RoundTrip(r.HTTPRequest)
    }
    
    func sendWithoutFollowRedirects(r *request.Request) (*http.Response, error) {
    @@ -126,7 +136,7 @@ func sendWithoutFollowRedirects(r *request.Request) (*http.Response, error) {
        transport = http.DefaultTransport
    }

and that I have done in two separate commits at my experimental fork at https://github.com/orijtech/aws-sdk-go

  1. Modified the template and the HTTP transports https://github.com/orijtech/aws-sdk-go/commit/711d79d3c8cfdec656caae086ee6d7b2ad1cfb82
  2. Ran make generate https://github.com/orijtech/aws-sdk-go/commit/de0c24fd2bfc1e637703184c4f5ad2c0b21321ff

With those steps, if we run the code below https://gist.github.com/odeke-em/844b33948506cba7de3323644b737688

we can get such results

On AWS X-Ray

screen shot 2018-05-01 at 9 13 38 pm

On Stackdriver

screen shot 2018-05-01 at 9 15 01 pm screen shot 2018-05-01 at 9 15 12 pm screen shot 2018-05-01 at 9 15 19 pm