Closed intrip closed 9 months ago
Hi @intrip,
Thanks for the bug report. I assume this is running on Ruby 3.3.0 ? IIRC we may have had another report of a memory leak.
@XrXr @k0kubun I'm wondering if this might be happening because we root some objects in YJIT when we bake addresses in the machine code, and we now have code GC disabled, so even dead ISEQs could still have code that never gets collected?
Might be useful if we could come up with a script that reproduces this memory leak to make it easier to debug. Maybe something that creates ActiveRecord models in a loop?
Hi @maximecb,
Thanks for the bug report. I assume this is running on Ruby 3.3.0 ? IIRC we may have had another report of a memory leak.
Yes, it's Ruby 3.3.0.
To workaround it for now I've reduced the YJIT memory to 32Mib and that mitigated greatly the issue, but it dropped a bit the ratio_in_yjit (from 98 to 95). I've also noticed that enabling GCs in YJIT mitigated it but led to worse overall performance so for now I've just reduced the overall memory.
We noticed it now because it became worse recently (not sure if it's related to the Ruby upgrade though) but TBH it's been a while that we were affected by this issue, at least 90+ days.
Might be useful if we could come up with a script that reproduces this memory leak to make it easier to debug. Maybe something that creates ActiveRecord models in a loop?
I'll give it a try but I'm about to leave now and I'll be OFF the next week so it might take some time sorry 😅.
No problem. I wish we could fix the issue faster :)
@intrip do you make any use of Relation#extending
? Or other ways to extend a module in relations?
Also, I think if you were to strip all the T_STRING
value
fields, you should be able to share the dump without leaking any personal data. Either publicly or in direct to one of us, that would be helpful to let us dig in the heap dump.
Might be useful if we could come up with a script that reproduces this memory leak to make it easier to debug.
I suspect it's the same bug than https://bugs.ruby-lang.org/issues/19436, but with YJIT (and worse because the references accumulate instead of only retaining the last one).
I just remembered that Aaron @tenderworks had mentioned a memory leak with YJIT 3.3 and Mastodon. This may be related. Would be good if we could get more info about the Mastodon issue.
I'm wondering if this might be happening because we root some objects in YJIT when we bake addresses in the machine code, and we now have code GC disabled, so even dead ISEQs could still have code that never gets collected?
They've been running 3.3.0-preview3 since November https://github.com/Shopify/ruby/issues/548#issuecomment-1823152356, which had code GC disabled unlike preview2. So I'm not sure if it explains the situation.
Also, Block
s are deallocated when the ISEQ is GCed (along with their gc_obj_offsets
), so "dead" ISEQs would not mark baked objects whether Code GC is disabled or not.
Anyway, a local repro would be really helpful for identifying the cause.
I suspect it's the same bug than https://bugs.ruby-lang.org/issues/19436, but with YJIT
Alright, that's what I suspected:
module Foo
def bar
end
end
def call_bar(obj)
# Here the call cache we'll keep a ref on the method_entry
# which then keep a ref on the singleton_class, making that
# instance immortal until the method is called again with
# another instance.
# The reference chain is IMEMO(callcache) -> IMEMO(ment) -> ICLASS -> CLASS(singleton) -> OBJECT
obj.bar
end
obj = Object.new
obj.extend(Foo)
5.times do
call_bar(obj)
end
id = obj.object_id
obj = nil
4.times { GC.start }
begin
ObjectSpace._id2ref(id)
puts "MEMORY LEAK!!!"
exit!(1)
rescue RangeError
puts "no leak"
exit!(0)
end
$ ruby -v /tmp/repro.rb
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
no leak
$ ruby -v /tmp/repro.rb
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
no leak
$ ruby -v /tmp/repro.rb
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
no leak
$ ruby --yjit --yjit-call-threshold=1 -v /tmp/repro.rb
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
MEMORY LEAK!!!
$ ruby --yjit --yjit-call-threshold=1 -v /tmp/repro.rb
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
MEMORY LEAK!!!
$ ruby --yjit --yjit-call-threshold=1 -v /tmp/repro.rb
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
MEMORY LEAK!!!
So @intrip is very likely using Relation#extending
, which is a giant footgun and we really should deprecate that API in Rails (Ref: https://github.com/rails/rails/pull/47314)
Following up on the discussions we had in meetings. Jean said that call caches must be weak references. @xrxr do we root objects / CMEs when compiling Ruby calls in YJIT? If so, could we make them weak references?
We do root CMEs because we root everything we refer to from generated code. Making them weak references will take some effort, though. Even with a constant pool we can't directly make use of the current mechanism that sets the VALUE to Qundef when the referent dies, because we have a GC yield point before writing out CMEs. If the CME gets collected during the GC we end up pushing a corrupted frame.
Would not compiling singleton instance methods be an easy workaround? Their perf already suck on the interpreter, so doubt it would impact YJIT performance noticeably.
And these methods are very likely not to survive long anyway, so compiling them is a bit of a waste.
Would not compiling singleton instance methods be an easy workaround? Their perf already suck on the interpreter, so doubt it would impact YJIT performance noticeably.
Should work if perf impact is indeed negligible. Hard to tell without experimenting, though. I guess we need to stop compiling calls on all objects with a singleton class, because we root the singleton class itself to do the known class guard, so it doesn't really matter where the method is. Here's a modified version of your script:
def call_bar(obj)
obj.itself
end
obj = Object.new
p obj.singleton_class
5.times do
call_bar(obj)
end
id = obj.object_id
obj = nil
4.times { GC.start }
begin
ObjectSpace._id2ref(id)
puts "MEMORY LEAK!!!"
exit!(1)
rescue RangeError
puts "no leak"
exit!(0)
end
$ ruby --yjit-call-threshold=1 test.rb
#<Class:#<Object:0x000000010313c0f0>>
MEMORY LEAK!!!
$ ruby test.rb
#<Class:#<Object:0x00000001029cb208>>
no leak
So it ends up being quite a drastic change. It'd also make a lot of existing tests useless since main
has a singleton class :/
It'd also make a lot of existing tests useless since
main
has a singleton class :/
main
could likely be handled differently, just like Class
and Module
(given it's legit and common for them to have a singleton_class).
Also I'm merely suggesting this has a quick fix that would be backportable to 3.3, not saying it's an ideal solution.
because we root the singleton class itself to do the known class guard
Oh. So that's worse than I thought :/
@XrXr can you explain in more detail why it's problematic to make the singleton class refs weak references?
In theory, if a singleton class is collected, that means it's dead, and it can't be called anymore, no?
If the machine code gets patched and the singleton class ref is replaced by Qnil using an atomic write, that will just cause chain guards to fail, which is to be expected since the class should already be dead by that point?
The GC can't patch the generated code because there is no write permission, so we need a constant pool.
Maybe for YJIT, like for TruffleRuby, the easier way to deal with this would be to make the Class#attached_object
a weak reference?
This was talked about on https://bugs.ruby-lang.org/issues/19436.
For TruffleRuby I think a weak reference in inline caches would always be a too big overhead (extra memory and extra indirection) to be acceptable performance-wise.
make the
Class#attached_object
a weak reference? This was talked about on https://bugs.ruby-lang.org/issues/19436.
I think it's something that would be positive, regardless of this YJIT problem. But the compiled code would still leak T_CLASS
instances, which would still wouldn't be great :/
But the compiled code would still leak
T_CLASS
instances, which would still wouldn't be great :/
Ah, right.
Maybe Code GC or so could remove such compiled code once it notices it refers to a singleton T_CLASS
with its instance GC'd?
Code GC would also be useful for cases like test suites which might create many anonymous modules/classes which don't live long but yet might be used a few times and some methods on them might end up being compiled.
IIRC the JVM considers references (object constants) embedded in JITed code to all be weak, i.e., if one of these referenced objects would GC then that code would be removed. (and if a thread is currently executing that JITed code it'd deopt to the interpreter I guess, which might work because the thread might already be in safepoint for this part of GC). I think on TruffleRuby specifically this is not enough because we hold on the Ruby Class in the Truffle AST (inline caches are just part of the Truffle AST).
Maybe Code GC or so could remove such compiled code once it notices it refers to a singleton
T_CLASS
with its instance GC'd?
For now YJIT's code GC just drop all the compiled code, so the cost/benefit isn't great for most apps, hence why it's disabled by default in 3.3. If the code GC becomes more advanced in the future (e.g. capable of freeing only the dead code), then yeah maybe.
As Alan mentioned, short/medium term a constant pool is more likely to happen.
That's why I suggested just not compiling anything that deals with a singleton class (unless the attached object is main
of a Module), as an instance having a singleton class strongly suggest that whatever code is compiled for it will soon be dead anyway, and probably won't be called enough to pay for the compilation. That also pretty much automatically prevent dynamic code generation issues, such as OpenStruct
.
Of course that's an heuristic, and there will be counter examples of singleton objects that are stable, but it should hold reasonably well.
The GC can't patch the generated code because there is no write permission, so we need a constant pool.
We do already change permissions on the generated code for movable GC objects though, no?
Otherwise, if we did have a constant pool, would you foresee any challenges with respect to making the reference weak references for chain guards?
We do already change permissions on the generated code for movable GC objects though, no?
We do, but the weak reference part of the GC is different. We get a callback for moving objects, and that allows us to flip permission for the places we write. For weak references there is no callback, and the GC can't reasonably distinguish between a weak reference in generated code versus one from somewhere else that it can write to without issues. Unless we want to flip permission on everything every sweep step, which doesn't sound great.
Otherwise, if we did have a constant pool, would you foresee any challenges with respect to making the reference weak references for chain guards?
The problem I mentioned in https://github.com/Shopify/ruby/issues/552#issuecomment-1904748960 could be worked around. We'd also need to rework how we track CME validity to make things fully weak.
For weak references there is no callback
It might be possible to implement such a callback?
@xrxr You understand the situation better than I do. What is your intuition as to what is the best solution to solve this memory leak problem?
We want to have something we could backport for 3.3, so I'll try and benchmark rejecting calls through singleton classes. In the long term we'll of course go with some sort of weak reference based solution.
Benchmark results:
yjit-pre: ruby 3.4.0dev (2024-01-23T19:00:05Z master 557b69e83b) +YJIT [arm64-darwin23]
yjit-post: ruby 3.4.0dev (2024-01-23T19:33:36Z yjit-stop-singleto.. af9a00414b) +YJIT [arm64-darwin23]
-------------- ------------- ---------- -------------- ---------- ----------------- ------------------
bench yjit-pre (ms) stddev (%) yjit-post (ms) stddev (%) yjit-post 1st itr yjit-pre/yjit-post
activerecord 15.9 1.2 17.8 0.7 0.94 0.89
chunky-png 357.2 1.3 369.4 0.5 0.95 0.97
erubi-rails 558.8 0.5 590.0 0.6 0.95 0.95
hexapdf 880.6 1.5 912.7 1.4 1.00 0.96
liquid-c 23.1 1.6 23.7 1.0 1.04 0.98
liquid-compile 25.5 5.7 26.9 0.8 0.96 0.95
liquid-render 34.8 1.7 36.1 1.0 0.97 0.96
lobsters 397.8 1.9 420.0 2.4 1.09 0.95
mail 49.1 0.8 48.5 0.8 0.99 1.01
psych-load 857.1 0.6 855.6 0.4 1.00 1.00
railsbench 598.0 1.5 644.0 1.8 0.96 0.93
rubocop 58.2 4.6 58.5 6.0 1.00 0.99
ruby-lsp 51.9 1.1 53.2 3.1 1.08 0.97
sequel 27.9 0.8 31.4 6.6 0.87 0.89
-------------- ------------- ---------- -------------- ---------- ----------------- ------------------
Not horrible, but we definitely take a hit. Is it acceptable?
Yikes, that is actually a pretty significant performance impact. I would prefer to avoid it.
Ah, there seems to be no perf hit if I amend the logic to accept class methods. Classes and modules tend to live forever anyways so this should still address the leak:
yjit-pre: ruby 3.4.0dev (2024-01-23T19:00:05Z master 557b69e83b) +YJIT [arm64-darwin23]
yjit-post: ruby 3.4.0dev (2024-01-23T19:33:36Z yjit-stop-singleto.. af9a00414b) +YJIT [arm64-darwin23]
-------------- ------------- ---------- -------------- ---------- ----------------- ------------------
bench yjit-pre (ms) stddev (%) yjit-post (ms) stddev (%) yjit-post 1st itr yjit-pre/yjit-post
activerecord 15.7 0.9 15.6 0.8 1.02 1.01
chunky-png 360.7 4.0 372.7 0.3 0.95 0.97
erubi-rails 559.8 0.8 548.0 0.6 1.02 1.02
hexapdf 872.4 1.0 877.0 0.8 1.00 0.99
liquid-c 23.8 1.5 23.9 3.5 1.21 1.00
liquid-compile 25.5 5.7 25.8 5.6 0.99 0.99
liquid-render 34.8 1.7 36.0 2.4 0.99 0.97
lobsters 393.4 2.5 402.2 2.3 1.04 0.98
mail 51.4 0.5 51.2 0.9 0.99 1.00
psych-load 844.7 0.0 838.3 0.1 1.01 1.01
railsbench 584.1 0.2 578.8 0.5 1.02 1.01
rubocop 58.6 4.6 58.3 4.4 1.01 1.00
ruby-lsp 52.0 1.0 51.8 0.9 1.01 1.00
sequel 28.1 1.1 28.4 1.7 0.99 0.99
-------------- ------------- ---------- -------------- ---------- ----------------- ------------------
Yes class methods should definitely continue to be compiled. Does your patch also compile methods on main? I'm surprised to see it have an impact on benchmarks at all
main
is just the top level self
on the entry point script passed to ruby
, so it really shouldn't impact performance. The impact in the chart is just noise on my laptop. On the benchmarking machine it's more clear that there is not much impact, except maybe on erubi-rails.
The upstream fix has been merged and is marked for backporting. I think we can close.
Thank you all for the fix! Limiting YJIT memory worked around it for now, and our ruby build tool doesn't support -dev yet (only rc/preview so far) so I'll wait for a ruby bump to test the fix 😅.
@intrip do you make any use of
Relation#extending
? Or other ways to extend a module in relations?
We use it but rarely: I only found one usage in the app codebase, but it doesn't match with the relation referenced in the heap dump.
Interesting, there might be something else causing relations to have a singleton class. No has_many extend:
or has_many do
either? I forgot to single these out, but the rely on extending
too.
Also did you look in your gems? In https://github.com/mastodon/mastodon/pull/28013 I found that kaminari
makes heavy use of it.
Interesting, there might be something else causing relations to have a singleton class. No
has_many extend:
orhas_many do
either? I forgot to single these out, but the rely onextending
too.
Oh good point. I found one more relation so now the total is 2, this one uses has_many extend:
but again doesn't match with the Entry > Message
path in the heap dump 🤔.
Also did you look in your gems? In mastodon/mastodon#28013 I found that
kaminari
makes heavy use of it.
I haven't found any outlier in the installed gems via ag '(\.extending|has_.*extend:|scope :.*do\n)' $(bundle show --paths | grep -v rails)
. For pagination, we rely on https://github.com/basecamp/geared_pagination which shouldn't be affected.
kk. I wish there was a way to check if an object has a singleton class to debug this further. Unless I'm missing something I think a C ext is needed for that today.
Would be interesting to monkey patch Relation
in your app to figure out where this is coming from. But ultimately it's mostly for intellectual curiosity because Alan's patch totally fix this concern, so no big deal.
We noticed a slow and steady memory leak in our web app, to investigate it I enabled
ObjectSpace.trace_object_allocations_start
and extracted a few heap dumps, then started analysing them with sheap.I found a common pattern: A
yjit_root
node holding references to ActiveRecord methods holding references to instances of ActiveRecord models then holding references to their instance attributes. These objects attributes sums up and creates a slow and steady memory leak:This is the node data until you reach the ActiveRecord model instance:
The fact that ActiveRecord instances hold references to their data looks good to me, to me looks like the issue is that
yjit_root
keeps indirectly referencing them, which prevents the objects from being GCed hence the leak. I've tried disabling YJIT on 1 host (hey-default-app-109) and the leak was gone, you can see it from this overnight graph of memory usage in the hosts:In these hosts we are running YJIT with the default options so just
RUBYOPT="--yjit"
Can you help me figure out what's going on here? This issue applies only to a subset of web hosts, which receives specific traffic that creates the "Entry" > "Message" > "Okra" objects.
Happy to tell you any additional info that could help, the next week I'll be OFF so if I won't answer quickly that's the reason 😅. Thanks!