Open schneems opened 2 years ago
While the parallel assignment is fewer operations, it also introduces an intermediate array that requires an allocation:
But only because the code you used is:
report = MemoryProfiler.report do
_a, _b = 1, 2
end
Can you try with:
report = MemoryProfiler.report do
_a, _b = 1, 2
nil
end
Then there should be no extra array allocation. And in fact the first benchmark's results almost prove there is no extra array, otherwise we'd see a significant slowdown.
I agree measuring with 2 elements is more representative.
I think the conclusion should be: sequential/parallel assignments are basically the same speed. However, this needs parallel assignment to not be the last expression, or in other words the method/block return value. In that case, the benefit of sequential parallel should be clearly visible.
FWIW, I personally think fast-ruby is not a good resource for "what one should use". It all depends on the Ruby version, implementation, context, how it's used, etc, etc.
Quick numbers:
require 'benchmark/ips'
def parallel
_a, _b = 1, 2
# nil
end
def sequential
_a = 1
_b = 2
# nil
end
Benchmark.ips do |x|
x.report('Parallel Assignment') { parallel }
x.report('Sequential Assignment') { sequential }
x.compare!
end
With the nil
s uncommented:
Parallel Assignment 16.221M (± 1.7%) i/s - 81.308M in 5.014057s
Sequential Assignment
16.024M (± 2.2%) i/s - 80.591M in 5.031993s
Comparison:
Parallel Assignment: 16221113.6 i/s
Sequential Assignment: 16024025.0 i/s - same-ish: difference falls within error
So it's the same speed (and sequential is not any faster for this case, it'd just be noise).
With the nil
s commented:
Parallel Assignment 11.434M (± 0.2%) i/s - 58.219M in 5.091781s
Sequential Assignment
16.122M (± 0.2%) i/s - 80.610M in 5.000155s
Comparison:
Sequential Assignment: 16121590.3 i/s
Parallel Assignment: 11434027.0 i/s - 1.41x (± 0.00) slower
As expected Sequential is much faster here, because Parallel must do an array allocation with the Parallel Assignment as last expression.
And with MemoryProfiler:
$ ruby -rmemory_profiler -e 'MemoryProfiler.report { _a, _b = 1, 2 }.pretty_print'
Total allocated: 40 bytes (1 objects)
Total retained: 40 bytes (1 objects)
...
$ ruby -rmemory_profiler -e 'MemoryProfiler.report { _a, _b = 1, 2; nil }.pretty_print'
Total allocated: 0 bytes (0 objects)
Total retained: 0 bytes (0 objects)
Thanks for catching that. The nil
return nuance is new to me (from this morning even).
I think maybe I associate parallel assignment with array allocation due to this profiling quirk. I'll update my main post to take that bit out.
This run actually has sequential being faster:
I believe that's just noise or too small to matter. I ran it 3 times locally, and the first I got Sequential 1.02x slower, 2nd and 3rd run same-ish: difference falls within error
. The many allocations (BTW, 2 per element since no frozen_string_literal
) completely bury the costs of the assignments, so that makes the benchmark a bit useless because it's mostly measuring those String allocations + GC.
I know you'd like Sequential Assignment to be faster, but it's not true/not that simple in general. It is very true if the Parallel Assignment is the last expression/a return value of a method/block, and this is worth making explicit about this benchmark. In other cases, they are basically the same and it's only a syntactic/bytecode difference. On CRuby Parallel Assignment is a bit faster likely due to bytecode details, but this "a bit faster" is likely too small to ever matter in practice.
For a general Ruby JIT, there should be no significant difference in generated code between both kind of assignments (for the non-last-expression case). For a Ruby JIT with inline and escape analysis, both kind of assignments are exactly the same, even if it's the last expression, as long as the JIT can see the result is unused (inlines enough to see that), as shown here: https://twitter.com/eregontp/status/1564950367485452288
In other cases, they are basically the same and it's only a syntactic/bytecode difference.
To illustrate this point, we can use ruby --dump=insns assign.rb
on the original benchmark:
== disasm: #<ISeq:fast@assign.rb:3 (3,0)-(6,3)> (catch: FALSE)
local table (size: 8, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 8] _a@0 [ 7] _b@1 [ 6] _c@2 [ 5] _d@3 [ 4] _e@4 [ 3] _f@5 [ 2] _g@6 [ 1] _h@7
0000 putobject_INT2FIX_1_ ( 4)[LiCa]
0001 putobject 2
0003 putobject 3
0005 putobject 4
0007 putobject 5
0009 putobject 6
0011 putobject 7
0013 putobject 8
0015 setlocal_WC_0 _h@7
0017 setlocal_WC_0 _g@6
0019 setlocal_WC_0 _f@5
0021 setlocal_WC_0 _e@4
0023 setlocal_WC_0 _d@3
0025 setlocal_WC_0 _c@2
0027 setlocal_WC_0 _b@1
0029 setlocal_WC_0 _a@0
0031 putnil
0032 leave ( 6)[Re]
== disasm: #<ISeq:slow@assign.rb:8 (8,0)-(18,3)> (catch: FALSE)
local table (size: 8, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 8] _a@0 [ 7] _b@1 [ 6] _c@2 [ 5] _d@3 [ 4] _e@4 [ 3] _f@5 [ 2] _g@6 [ 1] _h@7
0000 putobject_INT2FIX_1_ ( 9)[LiCa]
0001 setlocal_WC_0 _a@0
0003 putobject 2 ( 10)[Li]
0005 setlocal_WC_0 _b@1
0007 putobject 3 ( 11)[Li]
0009 setlocal_WC_0 _c@2
0011 putobject 4 ( 12)[Li]
0013 setlocal_WC_0 _d@3
0015 putobject 5 ( 13)[Li]
0017 setlocal_WC_0 _e@4
0019 putobject 6 ( 14)[Li]
0021 setlocal_WC_0 _f@5
0023 putobject 7 ( 15)[Li]
0025 setlocal_WC_0 _g@6
0027 putobject 8 ( 16)[Li]
0029 setlocal_WC_0 _h@7
0031 putnil
0032 leave ( 18)[Re]
So it's literally the same instructions, just in a slightly different order in CRuby 3.1.2. Which then makes it a little bit surprising why we can even see Parallel Assignment faster (maybe some better cache locality or maybe consecutive instructions are done slightly better?).
I know you'd like Sequential Assignment to be faster, but it's not true/not that simple in general.
I understand the limits of the measurements and statistics. I'm able to detect a 1% request speedup with 99% confidence in a Rails app end to end. Via derailed https://github.com/zombocom/derailed_benchmarks/blob/0102612a9915179abca304ad0392aa84ed6bdf0e/lib/derailed_benchmarks/stats_from_dir.rb#L143-L165.
I generally try annotating with things like (but within error)
. But sometimes I miss a few.
If we're talking probability. I would say my prior was strong that parallel assignment allocates, then you came along and made a chip in that prior, then adam came along and told me what I wanted to hear. I'm not being intentionally misleading, but I am very much intentionally trying to prove or disprove that concept to myself.
On CRuby Parallel Assignment is a bit faster likely due to bytecode details, but this "a bit faster" is likely too small to ever matter in practice.
100% agreed here. It's an interesting trivia night question, but won't affect IRL performance.
So it's literally the same instructions, just in a slightly different order in CRuby 3.1.2. Which then makes it a little bit surprising why we can even see Parallel Assignment faster (maybe some better cache locality or maybe consecutive instructions are done slightly better?).
Fascinating 🤷🏻♂️. Thanks for all the extra info
Update: Parallel assignment doesn't allocate unless it's the last statement in a method/proc. There's a benchmark provided below. I'm
I think this benchmark https://github.com/fastruby/fast-ruby/blob/main/code/general/assignment.rb needs to be removed or at least heavily annotated on the readme. It is not representative of real code. In my experience, removing parallel assignments into sequential ones speeds up apps.
One reason is that parallel assignments [can] allocate an intermediate array [if it is the last statement in a method for example], the other is that the benchmark uses MANY assignments and the performance difference seems to scale with assignment numbers. In the wild the most common number is 2.
Also worth noting that by using integers, we're isolating to only the assignment time (mostly) but this distorts the possible impact of the code. If you switch to strings that allocate then the pressure on the GC actually causes sequential assignment to be faster (at least sometimes).
Benchmarks below.
Reasoning
In https://github.com/fastruby/fast-ruby/blob/main/code/general/assignment.rb it states that parallel assignment is much faster than sequential assignment, but if you have only 2 assignments, the difference is negligible:
Note that sequential assignment is faster here (but within error). While the parallel assignment is fewer operations, it also introduces an intermediate array that requires an allocation:
Even in the case of MANY sequential versus parallel assignments, if the function is doing anything non-trivial (such as allocating strings as opposed to using integers which are all singletons, then the effect of the "faster" parallel assignment is pretty much negligible. This run actually has sequential being faster:
The sequential assignment is also faster here (but within error).