Closed maximecb closed 1 year ago
Some Hash#[]
stats taken from one interation of railsbench with adhoc tracing:
empty?
: 8.3%RHASH_AR_TABLE_P
: 53.5%default.nil?
: 90.6%size
:
Hash
: 97.1%Rack::Utils::HeaderHash
: 1.8%ActiveSupport::HashWithIndifferentAccess
: 0.7%String
:
Symbol
:String
: 48.0%Symbol
: 41.0%Integer
: 3.7%Array
: 1.2%Object
: 1.0%ActiveSupport::Notifications::Fanout
: 1.0%FalseClass
: 0.9%NilClass
: 0.9%Hash
: 0.7%Thread
: 0.6%Class
: 0.5%TrueClass
: 0.3%For hash keys that are symbols or string constants, could we precompute the hashed value somehow?
I thought of the same thing when I looked at the stats. ~This idea seems promising. This should at least be possible for opt_aref_with
(but it's not used by railsbench, so it doesn't matter).~ I take this back. See https://github.com/Shopify/ruby/issues/482#issuecomment-1378098552.
I guess that might be somewhat tricky because we have to know String#hash and Symbol#hash wasn't redefined?
At least for RHASH_AR_TABLE_P
hashes, which uses rb_any_hash
, it doesn't seem to respect #hash
overrides when a key is T_STRING
or T_SYMBOL
(or some other builtin types).
I failed to figure out what hash function do_hash
would normally use from quick reading due to its function pointer call, which is used by non-RHASH_AR_TABLE_P
hashes, but I guess the same thing would hopefully hold for consistency. (edit: I created a non-AR hash and tested that it doesn't call overridden #hash.)
Callers:
# method name: count (%)
/home/k0kubun/.gem/ruby/3.3.0/gems/rack-2.2.3.1/lib/rack/request.rb:63:in `get_header': 254060 (19.726089512003284)
/home/k0kubun/.gem/ruby/3.3.0/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/collection/map/non_concurrent_map_backend.rb:20:in `[]': 138667 (10.766581336538454)
/opt/rubies/ruby/lib/ruby/3.3.0+0/set.rb:404:in `include?': 82532 (6.408067462822385)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/ordered_options.rb:77:in `block in initialize': 53779 (4.1755859555460315)
/home/k0kubun/.gem/ruby/3.3.0/gems/activemodel-6.0.4/lib/active_model/attribute_set/builder.rb:39:in `[]': 35550 (2.760223892591186)
/home/k0kubun/.gem/ruby/3.3.0/gems/railties-6.0.4/lib/rails/initializable.rb:20:in `before': 28561 (2.2175739689534986)
/home/k0kubun/.gem/ruby/3.3.0/gems/railties-6.0.4/lib/rails/initializable.rb:24:in `after': 28535 (2.215555239805612)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/subscriber.rb:166:in `get_queue': 27921 (2.1678821745439807)
/home/k0kubun/.gem/ruby/3.3.0/gems/rack-2.2.3.1/lib/rack/utils.rb:457:in `[]': 27870 (2.163922359676972)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/configurable.rb:21:in `logger': 25941 (2.014148185589535)
/home/k0kubun/.gem/ruby/3.3.0/gems/rack-2.2.3.1/lib/rack/utils.rb:462:in `[]=': 23987 (1.8624329257829757)
/home/k0kubun/.gem/ruby/3.3.0/gems/activemodel-6.0.4/lib/active_model/attribute_set/builder.rb:108:in `assign_default_value': 22480 (1.7454242786343142)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionpack-6.0.4/lib/action_dispatch/journey/visitors.rb:42:in `block in evaluate': 18727 (1.454028490479751)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionview-6.0.4/lib/action_view/lookup_context.rb:273:in `block in initialize_details': 16000 (1.242294860237946)
/opt/rubies/ruby/lib/ruby/3.3.0+0/delegate.rb:349:in `block in delegating_block': 14000 (1.0870080027082027)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionpack-6.0.4/lib/action_dispatch/request/session.rb:49:in `[]': 13948 (1.0829705444124296)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/callbacks.rb:98:in `run_callbacks': 13943 (1.082582327268605)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/notifications.rb:247:in `instrumenter_for': 13201 (1.0249709031250704)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionview-6.0.4/lib/action_view/renderer/abstract_renderer.rb:80:in `block in extract_details': 13200 (1.0248932596963054)
...
Top examples:
Rack::Request::Env#get_header
https://github.com/rack/rack/blob/925a4a6599ab26b4f3455b525393fe155d443655/lib/rack/request.rb#L63Concurrent::Collection::NonConcurrentMapBackend#[]
https://github.com/ruby-concurrency/concurrent-ruby/blob/52c08fca13cc3811673ea2f6fdb244a0e42e0ebe/lib/concurrent-ruby/concurrent/collection/map/non_concurrent_map_backend.rb#L20Set#include?
https://github.com/ruby/ruby/blob/00d58afb5d7f3cf3a553e1e8f75f31e4d8781fe7/lib/set.rb#L404ActiveRecord::InheritableOptions#initialize
https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/activesupport/lib/active_support/ordered_options.rb#L77ActiveModel::LazyAttributeHash#[]
https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/activemodel/lib/active_model/attribute_set/builder.rb#L39Looking at those examples, it seems like you can't really assume a specific single key in majority of callsites unless you inline multiple frames (e.g. this should probably inline frames all the way up to Hash#[]
). Chain guards might be able to handle it better in railsbench, but it's likely not enough in actual applications.
I find a couple of examples where precomputing a hash seems reasonable without inlining, but those were the only such examples that I've found so far:
Rails::Initializable::Initializer#before
https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/railties/lib/rails/initializable.rb#L20Rails::Initializable::Initializer#after
https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/railties/lib/rails/initializable.rb#L24We should still be able to do some optimization for String and Symbol, but for now, maybe not precomputing a hash of a key.
The hashcode precomputing was the first thing I thought as well. Would it be possible to single out cases where a literal or constant is used? e.g. env["HTTP_REFERER"]
.
These are likely to be relatively common in some hotspots like Active Record attribute accessors, rack midlewares etc.
I failed to figure out what hash function do_hash would normally use from quick reading due to its function pointer call, which is used by non-RHASH_AR_TABLE_P hashes, but I guess the same thing would hopefully hold for consistency. (edit: I created a non-AR hash and tested that it doesn't call overridden #hash.)
That's interesting and potentially promising. If we can rely on Hash#[]
not calling any Ruby method and not performing GC and not throwing an exception, then we could save a little bit of overhead and avoid dropping all currently known type information. Then we could also upgrade the type in the context and avoid repeated type checks in successive hash accesses.
If the behavior is consistent, we could potentially add a test to Ruby spec to guarantee that behavior doesn't change too?
Top examples [...]
These methods that just index into a hash and return that are so silly and wasteful 🤦♀️ GAH... That being said we do try to pass type information into callees, so it's possible that these methods might get specialized for T_STRING or T_SYMBOL. We'd have to check how often that information is known in the context inside of opt_aref
.
These are likely to be relatively common in some hotspots like Active Record attribute accessors, rack midlewares etc.
I would have thought so too, but at least according to Kokubun's data, this is unfortunately not the case at all. Though inlining might uncover some opportunities for that which we don't see right now.
I would have thought so too, but at least according to Kokubun's data, this is unfortunately not the case at all
Keep in mind that railsbench contains very little applicative code. It's nice to bench Rails itself, but it's not 100% representative of an average Rails app.
I would have thought so too, but at least according to Kokubun's data, this is unfortunately not the case at all
Keep in mind that railsbench contains very little applicative code. It's nice to bench Rails itself, but it's not 100% representative of an average Rails app.
That is true. We could compare stats from liquid to have a better idea how things might differ with a different use case that also uses hashes a lot.
Keep in mind that railsbench contains very little applicative code. It's nice to bench Rails itself, but it's not 100% representative of an average Rails app.
This might be a valid point. The example Jean raised in https://github.com/Shopify/ruby/issues/482#issuecomment-1378367162, env["HTTP_REFERER"]
, is the only kind of code that uses opt_aref_with
, which YJIT doesn't compile yet. Railsbench doesn't side-exit on that insn at all, but SFR does by 0.3%. It might be nice to support this for SFR, and this insn is where precomputing a key's hash is always possible.
As to this specific code env["HTTP_REFERER"]
, however, I think this env
is often not Hash
but a wrapper of it. It would go through Rack::Request::Env#get_header
, which means that we don't use a literal when an actual Hash#[]
call happens. If all the opt_aref_with
side exits are on non-Hash receivers, then it might be a waste of time to optimize it that way. We should probably add stats first.
Hum, right. It's kinda similar with Active Record attribute methods, we end up with generated code like:
class SomeModel
def first_name
_read_attribute(:first_name) { |n| missing_attribute(n, caller) }
end
# This method exists to avoid the expensive primary_key check internally, without
# breaking compatibility with the read_attribute API
def _read_attribute(attr_name, &block) # :nodoc:
@attributes.fetch_value(attr_name, &block)
end
end
class AttributeSet
def fetch_value(name, &block)
self[name].value(&block)
end
end
So while the key is a literal, it doesn't directly access the Hash, so YJIT couldn't optimize this without first inlining.
Erf, what I was trying to say is that the key is a constant literal, but the actual hash lookup happens two method calls lower.
As to this specific code
env["HTTP_REFERER"]
, however, I think thisenv
is often notHash
but a wrapper of it. It would go throughRack::Request::Env#get_header
, which means that we don't use a literal when an actualHash#[]
call happens. If all theopt_aref_with
side exits are on non-Hash receivers, then it might be a waste of time to optimize it that way. We should probably add stats first.
We could certainly implement opt_aref_with
. Every little bit counts 👍
I started working on optimizing opt_aref
for Hash#[]
today. I failed to wrap it up today, but the idea is:
rb_hash_aref
call into rb_hash_stlike_lookup
and rb_hash_default_value
calls.
jit_prepare_routine_call
:
BOP_DEFAULT
is not redefined for Hash (and the receiver is rb_cHash, which is already guarded)RHASH_PROC_DEFAULT
flag (hence rb_hash_default_value
won't call a Proc)rb_any_hash
won't call #hash
)Type::Hash
in jit_guard_known_class
THash
and CHash
and check CHash
instead.jit_prepare_routine_call
(and its ctx.clear_local_types
) is skippedRHASH_PROC_DEFAULT
is already guarded, skip rb_hash_default_value
and just extract RHASH_IFNONE
in generated code.For the :before
/ :after
case, i.e. when a Symbol literal is used for the Hash#[]
call, maybe we could optimize the rb_hash_stlike_lookup
part further by precomputing the hash. But I'll likely scope that out from the first PR.
It's not going as well as I hoped.
This is the change I thought of yesterday https://github.com/Shopify/ruby/commit/f718a9e3b58039fed133c6573969349c26d87ef5 (not including a Hash
type upgrade, but I already tried it too). It's slightly slower than master, which might be due to the added guards. However, even just skipping jit_prepare_routine_call
and upgrading Hash
type, believing the property of compile-time values won't change without guards, doesn't make a significant impact either while it does seem to be skipped fairly often.
We need a better idea than just skipping jit_prepare_routine_call
and upgrading type information to Hash
.
Since Hash#fetch
is still a regular method call, I also tried leveraging that code to implement a specialized cfunc codegen for it https://github.com/Shopify/ruby/commit/0775c27b935028329692f530679cbff1cb4e84c6, but it didn't give a significant speedup in activerecord and railsbench (where Hash#fetch
is the third common C method) either.
The cost of rb_hash_stlike_lookup
seems significant compared to the method call overhead or PC/SP save. Unfortunately the most common Hash#fetch
callsite also didn't have a literal; it looks up a value with an argument as a key. As of now, inlining a String/Symbol literal of a key all the way down to a Hash lookup callsite seems like the only impactful approach.
I also started to feel that we should check not only call count stats but also time-based stats, using stackprof for example. If it also says that we spend a long time on Hash lookup callsites, well, it should continue to be a priority I guess.
It's too bad that these optimizations don't help. There's probably something to be done by reimplementing the underlying hash data structure, but that's probably not something we want to get into right away (might get pretty hairy?). If we did decide to do that, we'd probably want to profile the amount of time taken by each type of hash lookup. Might be interesting to gather such profiling data regardless, just as a data point.
Maybe even just a benchmark using benchmark/ips that compares time taken for:
Closing for now
@k0kubun has recently taken the time to gather some stats about the most common C and Ruby method calls in our headline benchmarks: https://github.com/Shopify/yjit-bench/pull/163
One of the things that is somewhat surprising is that
Hash#[]
is the most frequently called C method, and often by a long shot. For example, inactiverecord
, it makes up over 10% of the C method calls, with 87K calls, and the most called Ruby method is "only" called 26K times. It's also the topmost method in railsbench, where it's called about 4x more than the most frequently called Ruby methodsOnliquid-render
it's at 6.3%. Note that in theory, inliquid-render
, we already optimizeKernel#respond_to?
andArray#[]
, so the overhead taken byHash#[]
is probably significant.All of this is to say,
Hash#[]
gets called a lot on our benchmarks, more than basically any other C or Ruby method. Therefore, we should probably care about making sure that it runs fast. We probably don't want to generate JITted code for every single hash variable access because that might tie us a little too deeply with the specifics of the Ruby hash map implementation, which could get pretty messy, but there might still be some clever things we can do.Right now, from what I can see, we do have some special handling for hash reads in
opt_aref
, which callsrb_hash_aref
.Some ideas:
jit_prepare_routine_call
? We should be able to know that there will be no call into Ruby methods and no GC alloc in some cases?String#hash
andSymbol#hash
wasn't redefined?Type::Hash
, and we guard that the receiver is anrb_cHash
, but we never upgrade the type in the context, so successive hash reads would always have to run the guards