Jemalloc 5.0.1 doubled my app response times

abinoda commented 6 years ago

This is a full write-up of my unsuccessful attempt today at adding Jemalloc to my Rails app on Heroku. I'd greatly appreciate any suggestions and am happy to answer questions and try stuff. One idea floating in my mind is to maybe try a different version of Jemalloc, ie. 3.6.0.

Background

My app is https://pullreminders.com and it uses Ruby 2.5.0, Rails 5.2.1, and Sidekiq. I run a single 2x Heroku dyno with 6 Puma workers using 5 threads each. This is on the heroku-18 stack. I was interested in jemalloc to see if I could reduce memory usage of my web server (so I could maybe run more Puma workers), scheduled jobs, and Sidekiq.

The steps I took to add jemalloc to my production app were:

Enable Heroku Preboot to avoid downtime during dyno restarts
Set env vars JEMALLOC_ENABLED=true and JEMALLOC_VERSION=5.0.1
Add the buildpack with heroku buildpacks:add --index 1 https://github.com/gaffneyc/heroku-buildpack-jemalloc.git
Deploy an empty commit git commit --allow-empty -m "empty commit"

I did this at at ~4:45 PST. Just mentioning this because it was after Heroku's incident that day was resolved.

Results

Almost immediately after adding jemalloc, I noticed an uptick in my app's response times. In the chart below, the bars represent response time, and the yellow line is throughput (which was low relative to earlier in the day).

I double checked my Heroku metrics which also showed an approximate doubling of both my 50th percentile and 95th percentile average response times. My mean response time had gone from ~150ms to ~300ms. My 95th percentile response times had gone from ~350ms to ~800ms and were a bit spikey. There were no timeouts or errors.

Memory usage appeared to be creeping back up to its pre-jemalloc level:

After a little while I removed jemalloc and deployed another empty commit to rebuild my app. Sure enough, response times fell back to their pre-jemalloc levels:

abinoda commented 6 years ago

Ping @bf4 @nateberkopec @schneems @SamSaffron if you have time to take a look!

SamSaffron commented 6 years ago

I would try 3.6.0 and report back, we still use that at Discourse, also double check you don't have THP enabled in your OS, not sure what the deal is with Heroku there but you can wrap with my thpoff executable if you want to double confirm https://github.com/discourse/discourse_docker/blob/master/image/base/thpoff.c

abinoda commented 6 years ago

@SamSaffron Thanks! I'll try 3.6.0 and report back. Will look into THP... and also page @schneems on that one. FWIW I'm using the heroku-18 stack.

abinoda commented 6 years ago

The same thing seems to be happening with 3.6.0. The site has lower throughput now than earlier but I'm still seeing an approximate doubling of mean response time. Going to remove it now since I'm going to bed.

abinoda commented 6 years ago

Actually, this time after removing Jemalloc 3.6.0 the response times did not dip back down noticeably so I'm scratching my head a little. I'll try again tomorrow.

gaffneyc commented 6 years ago

Just to double check, are you deploying the jemalloc change in isolation or are you pairing it with other updates (code, ruby version, Heroku stack, etc...)? At least from the graphs you've shown it looks like the initial deploy was fine but RPM slowly decreased until response times started to spike.

If you're up for experimenting you could try TCMalloc instead (https://github.com/gaffneyc/heroku-buildpack-tcmalloc). We've seen similar reductions in memory usage and overall slower growth than with jemalloc (from some basic non-scientific testing).

abinoda commented 6 years ago

@gaffneyc Yep, in isolation with empty commits git commit --allow-empty -m "empty commit"

abinoda commented 6 years ago

@gaffneyc Which Heroku stack are you running for DMS?

gaffneyc commented 6 years ago

@abinoda We're on Heroku-18 for DMS.

Do you have metrics on GC pause time? It's looking like everything (controller, db, view, etc...) is slowing down which is odd. I feel like 6 workers on a 2x dyno is a lot (we run 2 processes and 5 threads on 2x dynos for DMS) but it's probably unrelated to what you're seeing.

Would you mind sharing some more info about your setup? Ruby version and possibly the Gemfile. My baseless theory is that either Ruby or a gem with a C extension isn't playing nice with it. I would also be curious to see numbers using tcmalloc if you're interesting in trying it.

I will say in the dozen or so apps we've deployed at Collective Idea with jemalloc we haven't seen a big spike in response time like you're seeing, so I don't have an example I can point to to debug or fix it.

abinoda commented 6 years ago

Do you have metrics on GC pause time? It's looking like everything (controller, db, view, etc...) is slowing down which is odd.

I don't have metrics on GC pause time on hand. I can confirm that everything slowed down vs any single part of the stack.

More app info

Ruby 2.5.0

Procfile (Puma worker count is set with env var):

release: rails db:migrate
web: DATABASE_POOL=5 bundle exec puma -C config/puma.rb -t 5:5
worker: DATABASE_POOL=10 bundle exec sidekiq -c 10

Gemfile: https://gist.github.com/abinoda/86a9c6f6618d63762d2e1ee9d06b629d

Gemfile.lock: https://gist.github.com/abinoda/8f5e25d3daeb8509681eb250ac3adeb8

abinoda commented 6 years ago

Right now I'm doing another test right now with Jemalloc v3.6.0. FWIW I'm not seeing as much of a discernible increase in response times as I did with v5.0.1, though there is still a slight increase showing at lower throughput. Given the sample size and other potential variables, I'm not 100% sure what to make of it.

Without Jemalloc

With Jemalloc

gaffneyc commented 6 years ago

@abinoda Just checking in, have you had any luck debugging this?

abinoda commented 6 years ago

@gaffneyc Nope, I had docked this for now. Each of my attempts with Jemalloc have led to an increase in response times and minimal, if any, improvement in memory usage. I haven't tried tcmalloc because it seems a little more experimental and I'm not sure I'm comfortable running it in production.

I'd be happy to run more Jemalloc experiments or provide any other information, though. I'd love to solve or at least understand this!

gaffneyc commented 6 years ago

@abinoda unfortunately there are few silver bullets in programming. I will saythe graphs for memory usage didn’t look terrible. As long as you’re not swapping you should be fine.

As for TCMalloc I wouldn’t call it experimental. I think it predates JEMalloc. It comes out of Google and is used in a lot of their services. The memory allocator in Go is based on it as well. As for Ruby I believe Shopify uses it in production and we’ve used it on and off in experiments on Dead Man’s Snitch without issue.

I’m going to close this for now since we don’t know if it’s a problem with the buildpack or upstream in Ruby or JEMalloc. The buildpack is pretty minimal, just building and hooking in the library, so it’s probably at a different layer.

abinoda commented 6 years ago

@gaffneyc Thanks for the info on TCMAlloc – I'll give it a try at some point. And thanks for the help here and for maintaining this buildpack!

gaffneyc / heroku-buildpack-jemalloc