getsentry / sentry-go

The official Go SDK for Sentry (sentry.io)
https://docs.sentry.io/platforms/go/
MIT License
918 stars 212 forks source link

Memory issue with profiling large number of goroutines #895

Open mtuska opened 1 week ago

mtuska commented 1 week ago

I'm encountering a memory issue which seems related to #748 while using sentry-go in a project that launches several hundred goroutines. I'd believe this would be solved with that issue, but since that was focused on CPU I thought I'd open a separate issue for tracking. In my setup, each goroutine opens its own transaction and spans for tracing, similar to the example from #748. However, I am using Sentry alongside OpenTelemetry (Otel) to handle third-party integrations such as Gorm and Redis.

In my implementation, transactions are created at the goroutine level to avoid exceeding span limits for jobs. This particular job I used for debugging can involve up to 1300 goroutines. However, when profiling is enabled, memory usage escalates significantly—what typically takes around 200MB of heap increases to 4GB. pprof svg

pprof top:

Showing nodes accounting for 4689.83MB, 99.52% of 4712.66MB total
Dropped 89 nodes (cum <= 23.56MB)
Showing top 10 nodes out of 40
      flat  flat%   sum%        cum   cum%
 4220.41MB 89.55% 89.55%  4220.41MB 89.55%  bytes.growSlice
  245.08MB  5.20% 94.76%  1941.08MB 41.19%  encoding/json.Marshal
  181.45MB  3.85% 98.61%   181.45MB  3.85%  github.com/getsentry/sentry-go.(*profileRecorder).GetSlice
   41.28MB  0.88% 99.48%    42.33MB   0.9%  github.com/getsentry/sentry-go.(*profileRecorder).processRecords
    1.11MB 0.023% 99.50%    43.44MB  0.92%  github.com/getsentry/sentry-go.(*profileRecorder).onTick
    0.51MB 0.011% 99.52%  1902.59MB 40.37%  git.bluestreamfiber.net/smp/cpe-status/interfaces/calix/axos.(*Device).GrabCpeStatus
         0     0% 99.52%       96MB  2.04%  bytes.(*Buffer).Grow
         0     0% 99.52%  2620.41MB 55.60%  bytes.(*Buffer).Write
         0     0% 99.52%     1504MB 31.91%  bytes.(*Buffer).WriteString
         0     0% 99.52%  4220.41MB 89.55%  bytes.(*Buffer).grow

Watching the sentry-go.profileRecorder, I see the variables stacks and frames increases through out the lifetime. 2024-10-22 12-34-58 2024-10-22 12-35-21 2024-10-22 12-36-09 2024-10-22 12-36-29 2024-10-22 12-37-03 Though I'll admit, I haven't explored the actual size of these variables, as I need to move on at the moment.

Golang is a recent language for me, been using it only for about a year, so let me know if I missed some obvious debugging item.

My initialization of Sentry/Otel with profiling being disabled to stop the memory sprawling out. I use the Otel sampler with Sentry set to 100%, and seems to have played nicely for me thus far. Though I have seen other issues open that seemed to state otherwise?

func init_sentry(ctx context.Context) *sdktrace.TracerProvider {
    AttachStacktrace := true
    EnableTracing := true
    if os.Getenv("NO_SENTRY") == "1" {
        AttachStacktrace = false
        EnableTracing = false
        logging.Logger.WithContext(ctx).Print("Disable Sentry Logging")
    }
    sentry_environ := os.Getenv("SENTRY_ENVIRONMENT")
    if sentry_environ == "" {
        sentry_environ = "development"
    }
    if err := sentry.Init(sentry.ClientOptions{
        Debug:              true,
        Dsn:                "",
        AttachStacktrace:   AttachStacktrace,
        SampleRate:         1.0,
        EnableTracing:      EnableTracing,
        TracesSampleRate:   1.0,
        ProfilesSampleRate: 0.0,
        Environment:        sentry_environ,
        Release:            version.AppFull,
    }); err != nil {
        logging.Logger.WithContext(ctx).WithError(err).Printf("Sentry initialization failed\n")
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithSpanProcessor(sentryotel.NewSentrySpanProcessor()),
        sdktrace.WithSampler(sdktrace.ParentBased(sdktrace.TraceIDRatioBased(0.1))),
        // sdktrace.WithSampler(CustomSampler{Chance: 0.1}),
    )
    otel.SetTracerProvider(tp)
    otel.SetTextMapPropagator(sentryotel.NewSentryPropagator())
    return tp
}

My util function for creating spans:

package util

import (
    "context"
    "fmt"

    "github.com/getsentry/sentry-go"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
)

func GetTracer() trace.Tracer {
    return otel.Tracer("git.bluestreamfiber.net/smp/cpe-status")
}

func MakeSpan(ctx context.Context, packageName string, service string, method string) (context.Context, trace.Span) {
    pkg := packageName
    if service != "" {
        pkg = fmt.Sprintf("%s.%s", pkg, service)
    }
    transaction := sentry.TransactionFromContext(ctx)
    if transaction == nil {
        transaction = sentry.StartTransaction(ctx, fmt.Sprintf("%s/%s", pkg, method))
        ctx = transaction.Context()
    }

    hub := sentry.GetHubFromContext(ctx)
    if hub == nil {
        hub = sentry.CurrentHub()
    }
    hub = hub.Clone()
    ctx, span := GetTracer().Start(ctx, fmt.Sprintf("%s/%s", pkg, method))
    ctx = sentry.SetHubOnContext(ctx, hub)

    return ctx, span
}

Example usage:

func (d *Device) MakeSpan(ctx context.Context, method string) (context.Context, trace.Span) {
    ctx, span := util.MakeSpan(ctx, d.GetWorker().GetSpanPackage(), "worker", method)
    span.SetAttributes(
        attribute.Int64("device.id", d.GetMap().DeviceId),
    )
    return ctx, span
}

func (d *Device) GrabCpeStatus(ctx context.Context) error {
    ctx, span := d.MakeSpan(ctx, "GrabCpeStatus")
    defer span.End()
cleptric commented 3 days ago

It's the same issue as in #748. We might consider removing profiling from the SDK as the PR we opened on Go is likley never being merged, https://go-review.googlesource.com/c/go/+/585396.