Shopify / ruby

The Ruby Programming Language [mirror]
https://www.ruby-lang.org/
Other
41 stars 13 forks source link

Document Ruby method call complexity #521

Open maximecb opened 1 year ago

maximecb commented 1 year ago

There is a lot of complexity behind Ruby method calls, because these method calls can do a lot of different things. There are many different types of arguments (positional, keyword, implicit block arguments, explicit block arguments), and the arguments can be optional, they can be splatted, you have both array and keyword splats. All of this complexity is also reflected in the implementations of CRuby, YJIT and TruffleRuby. There are a lot of things that I don't know or don't understand about Ruby calls. I'm sure it must be the case for a lot of Ruby newcomers, and even for some seasoned Ruby programmers.

I think it could be really helpful for me, for newcomers to the YJIT team, and potentially for TruffleRuby folks and newcomers to the Ruby language if we could document all of the different ways that Ruby method calls can be structured. Obviously, the implementation of CRuby/YJIT/TruffleRuby is going to be different, but we could start with a language-level description of what are all of the kinds of objects that can be receivers of calls. What are all the types of arguments that a Ruby call can receive? How can you mix these different types of arguments together? What are some things that are not allowed? How do you access the different types of arguments or blocks from inside of a method? What are some potential gotchas? We could document all of these points and include a lot of small Ruby code examples.

This could be potentially be in the form of a markdown document which we keep in the ruby/ruby repo. We could also start with a Google doc (they have markdown support now?) for ease of editing. We could collaborate between the YJIT and TruffleRuby folks, and solicit input from Ruby core members as well.

Ideally, for the YJIT team, where I would like to go with this is to work towards a generalized implementation of codegen for send. Right now, what we have is very ad-hoc and hard to read through. I think that documenting all these things could also help us think about inlining and a more efficient calling convention. Obviously, some of the YJIT details are going to be YJIT/CRuby-specific. We could have separate sections of the document for implementation-specific details, but I think that starting from a Ruby language-level description of all the possible flavors and rules around Ruby calls would be a good start.

Tagging @eregon since I feel he would have a lot to say on the subject. Also tagging @noahgibbs because of his writing experience.

eregon commented 1 year ago

all of the kinds of objects that can be receivers of calls all the types of arguments that a Ruby call can receive

I'm not sure to follow, what do you mean by kind/type, could you give an example?

I'd say the receiver of a call can be any Ruby BasicObject, but I guess that's not what you are talking about.

Re arguments types, I think we have just 3: positional arguments (*), keyword arguments (**) and block argument (&). Is that what you mean?

k0kubun commented 1 year ago

Re arguments types, I think we have just 3: positional arguments (*), keyword arguments (**) and block argument (&)

To users, yes. But implementation-wise, it's much more complicated, right? A block's |a| and |a,| behave differently while both as are of the same type, positional arguments, for example.

eregon commented 1 year ago

Right, I call that types of parameters (following the Ruby 3 more precise naming about arguments vs parameters), combinations of them and Proc destructuring (that's a messy one).

Re types of parameters, that's def m(pre, optional = x, *rest, post, kw_required:, kw: optional, **kwrest, &block).

And TruffleRuby has different node classes to read each of them, and typically it compares things like number of given argument vs parameter index. For kwargs it's more complicated as one needs to look into the Hash (in the general case).

In terms of calling convention, TruffleRuby has a single way to pack arguments, all arguments are in a flat Object[], including 8 hidden arguments before (notably self, the block and whether kwargs are passed). Keywords arguments are currently just packed into a regular Ruby Hash (with escape analysis and Hash storage strategies that optimizes OK), and there are plans to pass them individually. So it's like:

[
  outer frame for blocks,
  caller special variables ($~ and $_),
  metadata of the method being called (e.g. for __callee__),
  DeclarationContext (where to define methods if there is a `def`),
  FrameOnStackMarker, to know if a frame is still on the stack, needed for break/return to raise an error if not on stack anymore
  self,
  block,
  Are keyword arguments passed (if so they are the last arg),
  *positional arguments,
  keyword_hash (or no element if no kwargs passed)
]

It would be great if we did not need as much as 8 hidden arguments, but it seems hard to reduce given all the information to pass through the stack needed by Ruby semantics (gotcha: it's not OK to store these things in thread local variables instead, because e.g. a block can capture a certain state, and only be caller later, and what it captured should be respected). I think CRuby might have multiple stacks instead to track things like DeclarationContext/default definee?

TruffleRuby also has special handling for methods that need the caller frame such as Kernel#binding, those methods are actually executed in the caller frame. The same technique is used for attr_reader methods so they don't have the overhead of a full Ruby method call.

IIRC CRuby has way too many calling conventions/call types?

There is also a couple different ways to call for C extensions, as it depends on the arity given to rb_define_method: https://github.com/oracle/truffleruby/blob/00c67fc9529f9dff403718f979fa7ac09433d77f/lib/truffle/truffle/cext_ruby.rb#L23-L32

Regarding Proc destructuring I don't recall the exact rules but TruffleRuby's handling of that is here: https://github.com/oracle/truffleruby/blob/00c67fc9529f9dff403718f979fa7ac09433d77f/src/main/java/org/truffleruby/parser/MethodTranslator.java#L349-L366 https://github.com/oracle/truffleruby/blob/00c67fc9529f9dff403718f979fa7ac09433d77f/src/main/java/org/truffleruby/parser/MethodTranslator.java#L198-L237

There is also the case of def m((a,b), c) where a, b are "nested parameters". IIRC that's handled by coercing the first argument to an array, save it to a temp variable, and then reading at constant indices out of it.

maximecb commented 1 year ago

Thanks Benoit, that is very helpful.