The flamegraph seems to be incorrect/incomplete when using Ruby 3.0.0 and above

ToomeyDamien commented 2 years ago

The flamegraph seems to be incorrect/incomplete when using Ruby 3.0.0 and above.

Ruby 2.7.6 (correct/detailed flamegraph)	Ruby 3.0.0 (incorrect/incomplete in my opinion)

Screenshot obtained with the instructions below with 2.7.6 instead of 3.0.0	Screenshot obtained with the instructions below

Steps to reproduce

run

rvm install 3.0.0
rvm 3.0.0 do gem install rails -v 7.0.3
rvm 3.0.0 do rails _7.0.3_ new project_ruby_3_0_0
cd project_ruby_3_0_0

add the following snippet at the end of your Gemfile

# Gemfile
gem 'rack-mini-profiler'
gem 'memory_profiler'
gem 'stackprof'

add

rvm 3.0.0 do bundle install
rvm 3.0.0 do rails server
curl 'http://localhost:3000/rack-mini-profiler/requests?pp=async-flamegraph'

go to http://localhost:3000/rack-mini-profiler/requests
click on the flamegraph link

Let me know if I have missed something!

By the way, thanks for the great repo :)

SamSaffron commented 2 years ago

Something funky is going on, for sure, can you try sourcing stackprof from Github, does that resolve it?

langsharpe commented 1 year ago

I've tried using the current stackprof master 6bbabf1 and had similar results. No app code is included in the flamegraph.

Gem version: 3.0.0 Rails version: 7.0.3.1 Ruby version: 3.1.2p20

CleanShot 2022-07-27 at 10 40 27

chewi commented 1 year ago

Same here when running under Passenger.

bessey commented 1 year ago

Same here

Ruby 3.0.3 Rails 6.1.6.1 rack-mini-profiler 3.0.0 stackprof 0.2.20 (released July 26 @SamSaffron so I don't think this is an out of date dependency issue)

AnomalousBit commented 1 year ago

After taking a quick look, it appears the problem is indeed within StackProf. The sampling mode :wall doesn't appear to understand the changes around threads within Ruby >= 3.0.

The :wall sample method happens to be the default for both rack-mini-profiler and StackProf gems.

A quick workaround is to provide a different sample mode by using the flamegraph_mode query param. Here's an example, add this to your url:

?pp=flamegraph&flamegraph_mode=object

The available sample modes for StackProf are: cpu, object, custom and wall (wall is the one not currently working correctly).

I believe the object sample mode is the most thorough but also the slowest sampling technique. In my experiences, use the object sample mode and help yourself to a better flamegraph anyways!

juanmanuelramallo commented 1 year ago

The flamegraph_mode query param doesn't seem to be working

My env:

* rails (7.0.0) * rack-mini-profiler (2.3.2) * stackprof (0.2.22) * ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [arm64-darwin21]

In order to make it work I used this https://github.com/jlfwong/speedscope/wiki/Importing-from-stackprof-(ruby)

matt17r commented 1 year ago

I'm still seeing this issue after several months and updating everything. @AnomalousBit's workaround makes the flamegraph show up properly but doesn't solve the problem as object mode doesn't show me where rendering time is being spent.

When this was happening in December my versions were:

ruby 3.1.3p185
rails (7.0.3.1)
rack-mini-profiler (3.0.0)
stackprof (0.2.23) [was also happening on version 0.2.22]

It's still happening after making sure everything is on the latest versions:

ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
rails (7.0.4.2)
rack-mini-profiler (3.0.0)
stackprof (0.2.23)

In case it's relevant I'm on a 2021 MacBook Pro with an M1 Pro chip running macOS Ventura 13.1 (22C65)

matt17r commented 1 year ago

After doing some more digging I don't think it's a Ruby 3 thing as I can't reproduce it when I create a new app (unless I have a different issue to the OP).

What I'm seeing is 2221ms response times in my app but a flamegraph that's only ~40ms long...

22:06:33 web.1 | Completed 200 OK in 2221ms (Views: 2214.9ms | ActiveRecord: 2.2ms | Allocations: 18389) Flamegraph

To try and narrow it down I created a test app that's structurally almost identical to my production app but the flamegraphs display perfectly. Here's my working test app:

rails new rmp-graph-test --css=tailwind --database=postgresql --javascript=esbuild
cd rmp-graph-test
echo gem \"rack-mini-profiler\" >> Gemfile
echo gem \"stackprof\" >> Gemfile
bundle install
rails g scaffold post body:string author:string
open app/controllers/posts_controller.rb # optional, add `sleep 2` to #index action to mimic slow views
rails db:create
rails db:migrate
bin/dev
open "http://localhost:3000/posts?pp=flamegraph"

I then went through everything I could think of to compare the two apps to try and figure out the difference, either confirming it's the same or making it the same in the test app and then rebooting it:

Ruby config (Gemfile and Gemfile.lock)
Webserver config (config/puma.rb)
Rails and Rack config (config/application.rb, config/boot.rb and config.ru)
Start up instructions (bin/dev and Procfile.dev)
Environment configs (config/environment.rb, config/environments/development.rb and config/environments/production.rb)
Initialisers (particularly config/initializers/mini_profiler.rb but all others were identical too, except for config/initializers/i18n.rb which I left out of my test app)

Am I missing anything in that list? If not, what can I check next to try and narrow it down?

AnomalousBit commented 1 year ago

This in no way fixes the current issue but as another route for you and others trying to track down a performance problem, you can use the Benchmark module. https://ruby-doc.org/stdlib-2.7.1/libdoc/benchmark/rdoc/Benchmark.html

Wrap some of your code sections in puts Benchmark.measure{ ... } blocks and it will output to the console or logs how long it took to execute. You can also nest and label each of these blocks to better your understanding. I know it's a bit more manual and doesn't produce a nice graph, but it might get you where you are going until we can hopefully get this fixed.

Some places I've worked would benefit from having specific method call times logged for good measure anyways, this would be pretty trivial using the Benchmark module. Feed these performance metrics into Grafana and enjoy your free New Relic service. 😜

Good luck!

AnomalousBit commented 1 year ago

@matt17r

What I'm seeing is 2221ms response times in my app but a flamegraph that's only ~40ms long...

I think your flamegraph is missing two seconds of your response due to your sleep(2) call in posts_controller.rb. My best guess is that the sleep() (source: https://github.com/ruby/ruby/blob/50c6cabadca44b7b034eae5dcc8017154a2858bd/process.c#L5205 ) is literally putting the thread to sleep, making the thread unavailable for statistics collection by the stackprof gem.

The code below takes about 4.5 seconds on my machine and it shows up perfectly in the flamegraph using the example rmp-graph-test app you provided. Try something like this instead of using sleep():

values = []
100000000.times { |i| values << i }

Also, it appears the default :wall mode is working fine for me in the latest versions of ruby / rack-mini-profiler for anyone interested. Maintainers have suggested that stackprof version 0.2.23 has potentially fixed the flamegraph / threading problem mentioned throughout this thread. This issue may be closed if others can confirm?

matt17r commented 1 year ago

Thanks for the suggestions and help @AnomalousBit.

I finally solved my performance problem on the weekend after taking Nate Berkopec's advice to use rbspy. It's not as convenient as RMP but I still can't get RMP to work. I posted a little write up on Sunday night.

Regarding closing this issue, I had already upgraded from stackprof 0.2.22 to 0.2.23 before opening issue 193 on the stackprof repository and I am still seeing the issue in the actual app I was having performance problems with.

On the other hand, I cannot reproduce the issue in a brand new app, despite my best efforts to make the two apps identical.

So perhaps stackprof 0.2.23 does solve this issue and my particular flamegraph problem was a coincidence or a red herring???

If stackprof 0.2.23 solves it for others I've got no objection to this issue being closed. People can always open a new issue and reference this one if they discover new information while using 0.2.23 or greater. I wish it weren't so but I'm unlikely to be motivated to spend more time investigating this now I've solved my actual performance problem.

P.S. For what it's worth, the sleep 2 in my test app was showing up in the RMP flamegraph, at least in wall mode:

The ~2,180ms that were missing from my RMP flamegraph finally showed up in my rbspy flamegraph as repeated unnecessary calls to create a password for existing users:

nateberkopec commented 1 year ago

Confirmed addressed in latest stackprof versions. If you are seeing this, upgrade stackprof.

vascoveigams commented 10 months ago

I'm running into this with latest RMP 3.1.1 and stackprof 0.2.25. All profiles are captured using pp=async-flamegraph. How can I debug it further?

MiniProfiler / rack-mini-profiler

The flamegraph seems to be incorrect/incomplete when using Ruby 3.0.0 and above #536