adhearsion / punchblock

Telephony middleware library for Ruby
adhearsion.com/punchblock
MIT License
40 stars 34 forks source link

Memory leak in 2.5.0 #218

Open bklang opened 10 years ago

bklang commented 10 years ago

This is a continuation of the issues described in #217 as that PR was closed when the branch was merged.

Ping @sfgeorge, @benlangfeld.

Last comment:

From @sfgeorge:

Yes, the situation is significantly better by more than an order of magnitude! Under JRuby, I was able to run almost 40,000 calls (as opposed to just 1,700). This is a huge leap, but I'm hoping that we can do even better. Using afb3c6b, and the same test setup, I warmed up to 8cps and things stayed stable for over an hour. At first Full GCs were reclaiming a lot of memory, occurring every ~70 seconds But after almost 40,000 calls, Full GCs occurred at least once per second. After ~40,000 call failures started to occur. The complete gc log is here I can bump Gemfile[.lock] on the repo to reflect that I've been testing Punchblock's new feature/dtmf_recognizer_leak, would that be helpful?

From @bklang:

No, that's not necessary. Just the ObjectSpace information and any results you find by bisecting between the working and broken version (with the backported fix for the big leak) are the most important things at this point.

Example app: https://github.com/sfgeorge/punchblock-issue-memleak

benlangfeld commented 10 years ago

After running through 50k calls on Ruby 2.1.1 on both Asterisk and FreeSWITCH, I see memory usage remain fairly stable. @sfgeorge, please provide confirmation of your test results on both MRI and JRuby.

sfgeorge commented 10 years ago

@benlangfeld Was that using afb3c6b? I definitely hit trouble after 40,000 calls on that w/ JRuby as mentioned here. If you have a new version for me to test out, let me know.

benlangfeld commented 10 years ago

That was using the HEAD of develop. My previous request stands :)

benlangfeld commented 10 years ago

Basically what I'd like is you to update everything documented in your repo with HEAD of develop as of now, including all results, please :)

benlangfeld commented 10 years ago

Also note @sfgeorge, that you might see performance improvements by upgrading to JRuby 1.7.11 which provides a thread pool for Fibers, drastically reducing Thread allocations. JRuby 1.7.4 is quite old now. Additionally, MRI 2.1.1 will provide a huge performance boost over 1.9.3. Don't be scared of interpreter upgrades! ;)

sfgeorge commented 10 years ago

Running these tests takes a while, I don't know if it's worth it to re-run at this point. I say that because the only change to Punchblock's between afb3c6b..develop was to the Guardfile, which won't impact the results of this test.

Upgrading JRuby to 1.7.11 is a worthy change for performance reasons, but I do not believe that that is the cause of this memory leak. Do you agree?

benlangfeld commented 10 years ago

I do not believe it is the cause of the memory leak.

I cannot commit time to chase a memory leak that I cannot reproduce easily as I stated above. I'm looking for you to produce results similar to the ones you provided initially, only after the fix to DTMF recognizers (be it ec304302f76f47bea8c6bb0d82dedd9f7a04fe35 or develop), on both MRI and JRuby in order to either support or more fully refute my claim that there is no longer a leak.

In the absence of such suitable documentation, I'm going to have to close the issue as resolved based on the results of the testing I have been able to perform in the time I have available.

sfgeorge commented 10 years ago

Ben, I don't understand. I mentioned before that I have already tested the DTMF recognizers fix in a version of Punchblock that is functionally equivalent to ec30430, and included a gc log that shows the slow but steady growth of memory during that test.

Can you clarify what the goal is of re-testing here?

Please don't close this issue. :smile:

benlangfeld commented 10 years ago

I tested only on CRuby (2.1.1, to be exact). I did not experience any noticeable memory growth over 50k calls, testing twice. My tests were not exhaustive, and I may have missed something, but they gave me reasonable confidence that there is not a serious problem running on CRuby. I would like you to test sufficiently that you are confident that I am right, or able to present a solid enough case that I am wrong that I can justify taking time to look into it again.

As for JRuby, I did not test. I would like us to come to some agreement on the presence of the problem on CRuby first, to rule out this being some issue in JRuby itself. I am not interested in JRuby until we can say we're leak-free on CRuby. If we can say that, but JRuby exhibits problems, I would only be interested in testing on the latest JRuby release.

sfgeorge commented 10 years ago

Ok, cool. I'll give this a go on C-Ruby 2.1.1 and JRuby 1.7.11 with ec30430 and share results.

sfgeorge commented 10 years ago

Ok, I gave Punchblock 2.5.1 a run several times on ruby-2.1.0 via rvm (had trouble compiling ruby-2.1.1). There's definitely slow, but steady memory growth on the resident image size.

Taking a gross measurement starting after the first 10,000 calls, it seems that there an average growth of ~270 Bytes per call.

Memory growth vs. calls on Punchblock 2.5.1

Here's graphs & stats I collected when running against Punchblock 2.5.1..

Since it was CRuby I didn't need to prime it, but drew up slowly anyway so I could more easily measure from the "baseline" footprint as a starting point. Here was the velocity of the test:

benlangfeld commented 10 years ago

Thanks for that comprehensive detail, @sfgeorge. I'll review it as soon as I get a chance, probably next week.

sfgeorge commented 10 years ago

Sounds good, thanks Ben.

chewi commented 10 years ago

I've seen growing RAM usage but I chalked it up to a leaky Ruby 2.1 rather than Adhearsion. Could be wrong. Or could be both?

chewi commented 10 years ago

You may have seen this, it's been widely talked about.

http://www.omniref.com/blog/blog/2014/03/27/ruby-garbage-collection-still-not-ready-for-production/

But I am not denying that there is an issue in Punchblock, just saying that Ruby 2.1 may not be helping either.

Jared-Prime commented 10 years ago

I take back my comment, since @sfgeorge did test against 2.1. I'm not aware of any JRuby 1.7.11 performance numbers.

sfgeorge commented 10 years ago

@Jared-Prime

I take back my comment, since @sfgeorge did test against 2.1. I'm not aware of any JRuby 1.7.11 performance numbers.

Thanks, you are correct.. did that test against 2.1.0.

Previous test using the same code on JRuby 1.7.4 shows results that are dramatically worse - dying before even 40,000 calls with a 256MB heap.

Stopped and did not run a test on JRuby 1.7.11, because Ben said he'd like to see results on CRuby first.

@chewi

You may have seen this, it's been widely talked about.

http://www.omniref.com/blog/blog/2014/03/27/ruby-garbage-collection-still-not-ready-for-production/

But I am not denying that there is an issue in Punchblock, just saying that Ruby 2.1 may not be helping either.

Thanks, James. Yes, that article from Tim Robertson been a helpful article. Reviewed that a couple weeks ago.

<TL;DR>

A few grievances I have with this article:

Mis-quote Number 1

...Run this for long enough, and you’ll see your Ruby process consume more and more memory, in ever-wider swings (I let it gobble up over 2GB on my system before I killed it). But don’t take my word for it: ask Koichi Sasada, the author of the generational GC in Ruby 2.1:

“Some ‘short-lived’ young objects will be promoted to ‘old-gen’ accidentally….if such ‘short-lived’ objects consume huge memory we need to free such objects.”

Reading the full talk actually reveals that Koichi introduced the following mechanism with 2.1.0 to address that very problem:

New generational GC algorithm allows mixing "Write-barrier protected objects" and "WB unprotected objects"

(from Page 30 of Koichi's talk).

Mis-quote Number 2

Tim also partially-quotes Koichi on this:

also:

“We cannot measure how oldgen objects consume [memory collectively]….a few oldgen objects can grab huge memory”

(from page 83 of Koichi's talk)

But Tim failed to reference Koichi's existing solution to the problem, mentioned on the next page:

"We implemented this feature and evaluating it now"

(from page 84 of Koichi's talk)

Furthermore, on page 85... the laundry list of remaining tasks that Koichi stated was needed "Future work For smarter object management" in Ruby beyond 2.1.0 shows that Koichi's remaining concerns around RGenGC revolve around performance, not leaks. That doesn't prove that there aren't leaks, of course, just saying that this talk isn't the "everything's borked" exposé that it's made out to be.

</TL;DR>

Sorry. :blush:

chewi commented 10 years ago

I am aware that much of the article was debunked in this other article but this one did acknowledge that there is still an issue, which will be fixed in 2.1.2.

http://samsaffron.com/archive/2014/04/08/ruby-2-1-garbage-collection-ready-for-production

sfgeorge commented 10 years ago

Thanks, I did not see that article. Good to know that there's the malloc_limit bug that is fixed in Ruby Head.