arthurnn / memcached

A Ruby interface to the libmemcached C client
Academic Free License v3.0
432 stars 127 forks source link

random jumps in private memory usage with 0.19.x and passenger #19

Closed jnewland closed 14 years ago

jnewland commented 14 years ago

We're seeing a very strange memory characteristics in memcached 0.19.x releases with Passenger. The private memory of passenger processes will randomly jump in large increments (dozens of megabytes), while the total VMSize only grows slightly.

Here's a graph showing how rolling back to 0.18.0 drastically reduced memory usage of our Passenger processes.

annotated memory usage chart

(The rise and fall of memory usage you see in the app servers running 0.19.2 in that graph is a result of a reaper script we're using to kill passenger processes that leak too much memory)

I'm wondering if anyone is seeing similar behavior, or if this is specific to my environment. We're running:

I'm planning to take a hard look at this today, as I'd love to get the retry behavior included in 0.19.3. Initial tests of 0.19.3 show that it has the same strange memory characteristics. Bummer.

A couple questions:

Anyway, I'll be staring here for most of the rest of the day, trying to track this down. :) Thanks for any help/insight you might have.

ghost commented 14 years ago

I don't see anything in the Valgrind runs, even with COW turned on. Can you get latest master and try running "rake valgrind" in your production environment? I suspect some weird interaction with your app code.

I am using whatever SASL headers come with OS X Leopard.

evan commented 14 years ago

I may have figured it out. When we had show_backtraces turned on, we had a similar leak in 0.19. I don't know the root cause of that, but there is no reason to run show_backtraces in production in the first place; turning that off improves performance and stops the leak.

jnewland commented 14 years ago

That was it! I see no signs of a leak after on a hour in production with show_backtraces turned off on 0.20.1. The default default exceptions_to_retry/exception_retry_limit settings seem to be doing the trick too; we haven't had a single occurrence of a memcached timeout or the dreaded 'operation in progress' error bubbling up to cause a 500 since rolling this out. Thanks Evan, I owe you several beers. :)

evan commented 14 years ago

Hooray!