YJIT: can we optimize `Hash` property reads i.e. `Hash#[]` ?

maximecb commented 1 year ago

@k0kubun has recently taken the time to gather some stats about the most common C and Ruby method calls in our headline benchmarks: https://github.com/Shopify/yjit-bench/pull/163

One of the things that is somewhat surprising is that Hash#[] is the most frequently called C method, and often by a long shot. For example, in activerecord, it makes up over 10% of the C method calls, with 87K calls, and the most called Ruby method is "only" called 26K times. It's also the topmost method in railsbench, where it's called about 4x more than the most frequently called Ruby methodsOn liquid-render it's at 6.3%. Note that in theory, in liquid-render, we already optimize Kernel#respond_to? and Array#[], so the overhead taken by Hash#[] is probably significant.

All of this is to say, Hash#[] gets called a lot on our benchmarks, more than basically any other C or Ruby method. Therefore, we should probably care about making sure that it runs fast. We probably don't want to generate JITted code for every single hash variable access because that might tie us a little too deeply with the specifics of the Ruby hash map implementation, which could get pretty messy, but there might still be some clever things we can do.

Right now, from what I can see, we do have some special handling for hash reads in opt_aref, which calls rb_hash_aref.

Some ideas:

For hash keys that are symbols or string constants, could we precompute the hashed value somehow?
For objects that we know are of the symbol or string class, could we avoid the jit_prepare_routine_call? We should be able to know that there will be no call into Ruby methods and no GC alloc in some cases?
- That would have the benefit of avoiding clearing local types as well
- I guess that might be somewhat tricky because we have to know String#hash and Symbol#hash wasn't redefined?
At the moment, as far as I can tell, we have a Type::Hash, and we guard that the receiver is an rb_cHash, but we never upgrade the type in the context, so successive hash reads would always have to run the guards

k0kubun commented 1 year ago

Some Hash#[] stats taken from one interation of railsbench with adhoc tracing:

Receiver
- empty?: 8.3%
- RHASH_AR_TABLE_P: 53.5%
- default.nil?: 90.6%
- size:
  - avg: 18.4
  - median: 6
- Class:
  - Hash: 97.1%
  - Rack::Utils::HeaderHash: 1.8%
  - ActiveSupport::HashWithIndifferentAccess: 0.7%
  - ... (called once per request or less)
Key
- Object:
- String:
  - "href", "id", "PATH_INFO", "SCRIPT_NAME", "posts", "HTTPS", "rack.url_scheme": 1.1~2.1% each
  - Symbol:
  - :logger, :formats, :before, :after, :urlsafe_csrf_tokens, :controller, :path, :format: 1.3~2.9% each
- Class:
  - String: 48.0%
  - Symbol: 41.0%
  - Integer: 3.7%
  - Array: 1.2%
  - Object: 1.0%
  - ActiveSupport::Notifications::Fanout: 1.0%
  - FalseClass: 0.9%
  - NilClass: 0.9%
  - Hash: 0.7%
  - Thread: 0.6%
  - Class: 0.5%
  - TrueClass: 0.3%
  - ... (less than once per request)

For hash keys that are symbols or string constants, could we precompute the hashed value somehow?

I thought of the same thing when I looked at the stats. ~This idea seems promising. This should at least be possible for opt_aref_with (but it's not used by railsbench, so it doesn't matter).~ I take this back. See https://github.com/Shopify/ruby/issues/482#issuecomment-1378098552.

k0kubun commented 1 year ago

I guess that might be somewhat tricky because we have to know String#hash and Symbol#hash wasn't redefined?

At least for RHASH_AR_TABLE_P hashes, which uses rb_any_hash, it doesn't seem to respect #hash overrides when a key is T_STRING or T_SYMBOL (or some other builtin types).

I failed to figure out what hash function do_hash would normally use from quick reading due to its function pointer call, which is used by non-RHASH_AR_TABLE_P hashes, but I guess the same thing would hopefully hold for consistency. (edit: I created a non-AR hash and tested that it doesn't call overridden #hash.)

k0kubun commented 1 year ago

Callers:

# method name: count (%)
/home/k0kubun/.gem/ruby/3.3.0/gems/rack-2.2.3.1/lib/rack/request.rb:63:in `get_header': 254060 (19.726089512003284)
/home/k0kubun/.gem/ruby/3.3.0/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/collection/map/non_concurrent_map_backend.rb:20:in `[]': 138667 (10.766581336538454)
/opt/rubies/ruby/lib/ruby/3.3.0+0/set.rb:404:in `include?': 82532 (6.408067462822385)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/ordered_options.rb:77:in `block in initialize': 53779 (4.1755859555460315)
/home/k0kubun/.gem/ruby/3.3.0/gems/activemodel-6.0.4/lib/active_model/attribute_set/builder.rb:39:in `[]': 35550 (2.760223892591186)
/home/k0kubun/.gem/ruby/3.3.0/gems/railties-6.0.4/lib/rails/initializable.rb:20:in `before': 28561 (2.2175739689534986)
/home/k0kubun/.gem/ruby/3.3.0/gems/railties-6.0.4/lib/rails/initializable.rb:24:in `after': 28535 (2.215555239805612)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/subscriber.rb:166:in `get_queue': 27921 (2.1678821745439807)
/home/k0kubun/.gem/ruby/3.3.0/gems/rack-2.2.3.1/lib/rack/utils.rb:457:in `[]': 27870 (2.163922359676972)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/configurable.rb:21:in `logger': 25941 (2.014148185589535)
/home/k0kubun/.gem/ruby/3.3.0/gems/rack-2.2.3.1/lib/rack/utils.rb:462:in `[]=': 23987 (1.8624329257829757)
/home/k0kubun/.gem/ruby/3.3.0/gems/activemodel-6.0.4/lib/active_model/attribute_set/builder.rb:108:in `assign_default_value': 22480 (1.7454242786343142)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionpack-6.0.4/lib/action_dispatch/journey/visitors.rb:42:in `block in evaluate': 18727 (1.454028490479751)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionview-6.0.4/lib/action_view/lookup_context.rb:273:in `block in initialize_details': 16000 (1.242294860237946)
/opt/rubies/ruby/lib/ruby/3.3.0+0/delegate.rb:349:in `block in delegating_block': 14000 (1.0870080027082027)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionpack-6.0.4/lib/action_dispatch/request/session.rb:49:in `[]': 13948 (1.0829705444124296)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/callbacks.rb:98:in `run_callbacks': 13943 (1.082582327268605)
/home/k0kubun/.gem/ruby/3.3.0/gems/activesupport-6.0.4/lib/active_support/notifications.rb:247:in `instrumenter_for': 13201 (1.0249709031250704)
/home/k0kubun/.gem/ruby/3.3.0/gems/actionview-6.0.4/lib/action_view/renderer/abstract_renderer.rb:80:in `block in extract_details': 13200 (1.0248932596963054)
...

Top examples:

19.7%: Rack::Request::Env#get_header https://github.com/rack/rack/blob/925a4a6599ab26b4f3455b525393fe155d443655/lib/rack/request.rb#L63
10.8%: Concurrent::Collection::NonConcurrentMapBackend#[] https://github.com/ruby-concurrency/concurrent-ruby/blob/52c08fca13cc3811673ea2f6fdb244a0e42e0ebe/lib/concurrent-ruby/concurrent/collection/map/non_concurrent_map_backend.rb#L20
6.4%: Set#include? https://github.com/ruby/ruby/blob/00d58afb5d7f3cf3a553e1e8f75f31e4d8781fe7/lib/set.rb#L404
4.2%: block in ActiveRecord::InheritableOptions#initialize https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/activesupport/lib/active_support/ordered_options.rb#L77
2.7%: ActiveModel::LazyAttributeHash#[] https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/activemodel/lib/active_model/attribute_set/builder.rb#L39

Looking at those examples, it seems like you can't really assume a specific single key in majority of callsites unless you inline multiple frames (e.g. this should probably inline frames all the way up to Hash#[]). Chain guards might be able to handle it better in railsbench, but it's likely not enough in actual applications.

I find a couple of examples where precomputing a hash seems reasonable without inlining, but those were the only such examples that I've found so far:

2.2%: Rails::Initializable::Initializer#before https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/railties/lib/rails/initializable.rb#L20
2.2%: Rails::Initializable::Initializer#after https://github.com/rails/rails/blob/6e721d79ba9458317c93bb02590bbaa25c5a3649/railties/lib/rails/initializable.rb#L24

We should still be able to do some optimization for String and Symbol, but for now, maybe not precomputing a hash of a key.

casperisfine commented 1 year ago

The hashcode precomputing was the first thing I thought as well. Would it be possible to single out cases where a literal or constant is used? e.g. env["HTTP_REFERER"].

These are likely to be relatively common in some hotspots like Active Record attribute accessors, rack midlewares etc.

maximecb commented 1 year ago

I failed to figure out what hash function do_hash would normally use from quick reading due to its function pointer call, which is used by non-RHASH_AR_TABLE_P hashes, but I guess the same thing would hopefully hold for consistency. (edit: I created a non-AR hash and tested that it doesn't call overridden #hash.)

That's interesting and potentially promising. If we can rely on Hash#[] not calling any Ruby method and not performing GC and not throwing an exception, then we could save a little bit of overhead and avoid dropping all currently known type information. Then we could also upgrade the type in the context and avoid repeated type checks in successive hash accesses.

If the behavior is consistent, we could potentially add a test to Ruby spec to guarantee that behavior doesn't change too?

Top examples [...]

These methods that just index into a hash and return that are so silly and wasteful 🤦‍♀️ GAH... That being said we do try to pass type information into callees, so it's possible that these methods might get specialized for T_STRING or T_SYMBOL. We'd have to check how often that information is known in the context inside of opt_aref.

These are likely to be relatively common in some hotspots like Active Record attribute accessors, rack midlewares etc.

I would have thought so too, but at least according to Kokubun's data, this is unfortunately not the case at all. Though inlining might uncover some opportunities for that which we don't see right now.

casperisfine commented 1 year ago

I would have thought so too, but at least according to Kokubun's data, this is unfortunately not the case at all

Keep in mind that railsbench contains very little applicative code. It's nice to bench Rails itself, but it's not 100% representative of an average Rails app.

maximecb commented 1 year ago

I would have thought so too, but at least according to Kokubun's data, this is unfortunately not the case at all

Keep in mind that railsbench contains very little applicative code. It's nice to bench Rails itself, but it's not 100% representative of an average Rails app.

That is true. We could compare stats from liquid to have a better idea how things might differ with a different use case that also uses hashes a lot.

k0kubun commented 1 year ago

Keep in mind that railsbench contains very little applicative code. It's nice to bench Rails itself, but it's not 100% representative of an average Rails app.

This might be a valid point. The example Jean raised in https://github.com/Shopify/ruby/issues/482#issuecomment-1378367162, env["HTTP_REFERER"], is the only kind of code that uses opt_aref_with, which YJIT doesn't compile yet. Railsbench doesn't side-exit on that insn at all, but SFR does by 0.3%. It might be nice to support this for SFR, and this insn is where precomputing a key's hash is always possible.

As to this specific code env["HTTP_REFERER"], however, I think this env is often not Hash but a wrapper of it. It would go through Rack::Request::Env#get_header, which means that we don't use a literal when an actual Hash#[] call happens. If all the opt_aref_with side exits are on non-Hash receivers, then it might be a waste of time to optimize it that way. We should probably add stats first.

casperisfine commented 1 year ago

Hum, right. It's kinda similar with Active Record attribute methods, we end up with generated code like:

class SomeModel
  def first_name
    _read_attribute(:first_name) { |n| missing_attribute(n, caller) }
  end

  # This method exists to avoid the expensive primary_key check internally, without
  # breaking compatibility with the read_attribute API
  def _read_attribute(attr_name, &block) # :nodoc:
    @attributes.fetch_value(attr_name, &block)
  end
end

class AttributeSet
  def fetch_value(name, &block)
    self[name].value(&block)
  end
end

So while the key is a literal, it doesn't directly access the Hash, so YJIT couldn't optimize this without first inlining.

casperisfine commented 1 year ago

Erf, what I was trying to say is that the key is a constant literal, but the actual hash lookup happens two method calls lower.

maximecb commented 1 year ago

As to this specific code env["HTTP_REFERER"], however, I think this env is often not Hash but a wrapper of it. It would go through Rack::Request::Env#get_header, which means that we don't use a literal when an actual Hash#[] call happens. If all the opt_aref_with side exits are on non-Hash receivers, then it might be a waste of time to optimize it that way. We should probably add stats first.

We could certainly implement opt_aref_with. Every little bit counts 👍

k0kubun commented 1 year ago

I started working on optimizing opt_aref for Hash#[] today. I failed to wrap it up today, but the idea is:

Split the rb_hash_aref call into rb_hash_stlike_lookup and rb_hash_default_value calls.
- I have a working patch that does just this, and it didn't cause a slowdown.
When all the following conditions hold on compile-time values, guard them and skip jit_prepare_routine_call:
- BOP_DEFAULT is not redefined for Hash (and the receiver is rb_cHash, which is already guarded)
- The hash doesn't have RHASH_PROC_DEFAULT flag (hence rb_hash_default_value won't call a Proc)
- The key are either T_STRING or T_SYMBOL (hence rb_any_hash won't call #hash)
- More types could be supported, but it seems unimportant https://github.com/Shopify/ruby/issues/482#issuecomment-1378020807
We can upgrade the type to Type::Hash in jit_guard_known_class
- Actually, I'll split it into THash and CHash and check CHash instead.
- This is useful only when jit_prepare_routine_call (and its ctx.clear_local_types) is skipped
When RHASH_PROC_DEFAULT is already guarded, skip rb_hash_default_value and just extract RHASH_IFNONE in generated code.

For the :before / :after case, i.e. when a Symbol literal is used for the Hash#[] call, maybe we could optimize the rb_hash_stlike_lookup part further by precomputing the hash. But I'll likely scope that out from the first PR.

k0kubun commented 1 year ago

It's not going as well as I hoped.

This is the change I thought of yesterday https://github.com/Shopify/ruby/commit/f718a9e3b58039fed133c6573969349c26d87ef5 (not including a Hash type upgrade, but I already tried it too). It's slightly slower than master, which might be due to the added guards. However, even just skipping jit_prepare_routine_call and upgrading Hash type, believing the property of compile-time values won't change without guards, doesn't make a significant impact either while it does seem to be skipped fairly often.

We need a better idea than just skipping jit_prepare_routine_call and upgrading type information to Hash.

k0kubun commented 1 year ago

Since Hash#fetch is still a regular method call, I also tried leveraging that code to implement a specialized cfunc codegen for it https://github.com/Shopify/ruby/commit/0775c27b935028329692f530679cbff1cb4e84c6, but it didn't give a significant speedup in activerecord and railsbench (where Hash#fetch is the third common C method) either.

The cost of rb_hash_stlike_lookup seems significant compared to the method call overhead or PC/SP save. Unfortunately the most common Hash#fetch callsite also didn't have a literal; it looks up a value with an argument as a key. As of now, inlining a String/Symbol literal of a key all the way down to a Hash lookup callsite seems like the only impactful approach.

I also started to feel that we should check not only call count stats but also time-based stats, using stackprof for example. If it also says that we spend a long time on Hash lookup callsites, well, it should continue to be a priority I guess.

maximecb commented 1 year ago

It's too bad that these optimizations don't help. There's probably something to be done by reimplementing the underlying hash data structure, but that's probably not something we want to get into right away (might get pretty hairy?). If we did decide to do that, we'd probably want to profile the amount of time taken by each type of hash lookup. Might be interesting to gather such profiling data regardless, just as a data point.

Maybe even just a benchmark using benchmark/ips that compares time taken for:

Many instance variable reads in a loop
Same number of reads from an array hash
Same number of reads from an stlike hash

maximecb commented 1 year ago

Closing for now

Shopify / ruby

YJIT: can we optimize `Hash` property reads i.e. `Hash#[]` ? #482