DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
307 stars 375 forks source link

Too many samples in a profile #3853

Closed sivachandran closed 1 month ago

sivachandran commented 2 months ago

Current behaviour I get 100K+ samples with 1min profile. Processing such large no. of samples requires large memory and leads to out-of-memory crash.

Expected behaviour Around 6000 samples assuming 100 samples per second.

Steps to reproduce Profile the following ruby app and continuously make requests to / endpoint.

require 'datadog/profiling/preload'

require "digest"
require "sinatra"
require "json"
require "digest"

set :port, ENV.fetch("PORT", "5005").to_i

get '/' do
    users = []
    for i in 1..1000 do
        users << User.new("Username#%d" % i, Digest::MD5.hexdigest("a%{i}Super%{i}Strong%{i}Password" % {i: i}))
    end

    users.to_json
end

class User
    def initialize(name, password)
        @name = name
        @password = password
    end

    def as_json(options={})
        {
            username: @name,
            password: @password
        }
    end

    def to_json(*options)
        as_json(*options).to_json(*options)
    end
end

How does datadog help you? We are using the Ruby profiler as part of our observability stack to profile Ruby applications.

Environment

ivoanjo commented 2 months ago

Hey :wave:

The profiler aims for 100 samples per second per thread; I suspect that depending on your web server setup, you may have a bunch of threads (and dd-trace-rb itself adds a few of its own), so that's how you get the 100k samples.

While we don't usually recommend it, if you're looking to reduce samples, you may want to try setting the DD_PROFILING_TIMELINE_ENABLED env flag to false (or c.profiling.advanced.timeline_enabled = false via code). Specifically, disabling timeline will mean that samples for the same thread/request will aggregate together, and thus the total number of samples in the pprof will be reduced.

Also, I'm curious: can you share what kinds of processing you're looking to do on the pprof?

sivachandran commented 1 month ago

@ivoanjo Thanks for the reply.

Setting DD_PROFILING_TIMELINE_ENABLED = false reduced the no. of samples to < 2K.

We are processing the pprof and storing the samples in DB for later viewing.

ivoanjo commented 1 month ago

Cool, glad that the configuration worked for you. This does disable the timeline visualization in the Datadog UX, so in cases where you don't want to directly post-process the profiles, I'd recommend still enabling it.