aws / aws-xray-sdk-go

AWS X-Ray SDK for the Go programming language.
Apache License 2.0
276 stars 117 forks source link

Long running job with a lot of api calls creates too many subsegments #238

Closed alexcetto closed 4 years ago

alexcetto commented 4 years ago

Hi, first time user of the aws-xray-sdk-go here.

I am using version aws-xray-sdk-go@v1.1.0 and go v.1.14.1.

I am developing an http server in fargate that runs a long task of producing a file. It may call hundred of times external APIs during the whole lifecycle of this task.

I am using xray to trace this long running job. The problem I see is that the main segment spawns too many subsegments and it looks like those subsegments become to large to send on UDP.

Symptoms:

You can checkout the heap from pprof on this file: heapfull.log

It happens in local env and in ECS Fargate machine with a xray container attached.

Not sure if this is a bug or a bad use of the xray sdk. Let me know if you need more details.

Alex

alexcetto commented 4 years ago

I manage to reproduce with a simple code. This is a very basic implementation of what my service does. If you look at the HelloServer() handler, I use DetachContext because the long running task will run inside a new goroutine. This will trigger the behaviour where the subsegments won't be send and will pile up in memory.

package main

import (
    "github.com/aws/aws-xray-sdk-go/header"
    "github.com/aws/aws-xray-sdk-go/xray"
    "github.com/gorilla/mux"
    "log"
    "net/http"
    "net/http/httptest"
    "net/http/pprof"
    "time"
)

func mockServer(httpCode int) *httptest.Server {
    s := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(httpCode)
        xray.NewSegmentFromHeader(r.Context(), "Subsegment", r, header.FromString("newdata"))
        time.Sleep(3*time.Millisecond)
        w.Header().Set("Content-Type", "application/json")
    }))
    return s
}

func main() {
    router := mux.NewRouter()
    router.Handle("/", xray.Handler(xray.NewFixedSegmentNamer("newsegment"), HelloServer()))
    router.PathPrefix("/debug/pprof").HandlerFunc(pprof.Index)

    log.Panic(http.ListenAndServe("localhost:8067", router))
}

func HelloServer() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        var client = xray.Client(http.DefaultClient)
        ctx := r.Context()
        srv := mockServer(200)
        for jj:=3; jj > 0; jj-- {
            go func() {
                newCtx := xray.DetachContext(ctx)
                for i:=0; i < 100; i++ {
                    req, _ := http.NewRequestWithContext(newCtx, "GET", srv.URL, nil)
                    client.Do(req)
                }
            }()
        }
        w.WriteHeader(200)
        w.Write([]byte("ok"))
    }
}
bhautikpip commented 4 years ago

Hi @alexcetto ,

Thanks for opening this issue. I will try repro code you have posted and get back to you with more updates on this issue! Thanks!

bhautikpip commented 4 years ago

Hi @alexcetto ,

I tried running above sample code. I was able to sent subsegments and was able to see one trace which contains segment named newsegment and lot of subsegments (basically http requests). I might not have changed anything on the above code. Is there any other setup required to repro this behavior ? I basically want to repro how DetachContext API results in not sending subsegments and pile up in memory.

alexcetto commented 4 years ago

Hi,

The sample code doesn't require any configuration. I ran it in local with localstack and it is running in two environment in ECS and the problem appears for all. Did you manage to get any of the symptoms anyway? To solve the memory problem, I replaced the DetachedContext with:

newCtx, seg := xray.BeginSegment(ctx, xrayNamer)
defer seg.Close(nil)

This is not the exact behaviour I am looking for but at least the memory usage goes down after the execution.

bhautikpip commented 4 years ago

Hi @alexcetto ,

I managed to get this [ERROR] write udp 127.0.0.1:60402->127.0.0.1:2000: write: message too long . I started looking into DetachContext API, this API basically get the current segment from the context passed. In the above example you are spinning up multiple goroutines so not sure DetachContext would be the right choice for this scenario.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in next 7 days. Thank you for your contributions.