fastruby / fast-ruby

:dash: Writing Fast Ruby :heart_eyes: -- Collect Common Ruby idioms.
https://github.com/fastruby/fast-ruby
5.67k stars 376 forks source link

Parallel assignment is equal to sequential assignment #94

Closed radarek closed 1 year ago

radarek commented 8 years ago

This one is probably the most confusing benchmark I've ever seen. Original idea was that sequential assignment is faster than parallel (https://speakerdeck.com/sferik/writing-fast-ruby?slide=45). Later @charliesome found (https://github.com/JuanitoFatas/fast-ruby/pull/50) that this benchmark was incorrect because Ruby creates new array with all variables when parallel assignment is last expression in a method. After recalculation parallel assignment became faster than sequential.

Which one is actually faster?

I claim that none of them. Parallel assignment is actually equal to sequential. In the first comment we can read: "In fact, this is even faster than splitting these assignments out over multiple lines because the compiler does not need to emit a per-line trace instruction for each assignment."

I wonder why nobody broaden this topic. "trace" instruction is emitted for every line of your ruby program. Not only for assignments, but for every line. Does it mean we should write our programs as a oneliners? Absolutely not! "trace" has very low overhead. But suppose you are paranoid... Surprise! Ruby VM (MRI) allows you to disable emitting this instructions completely. Here is the trick:

RubyVM::InstructionSequence.compile_option = {
  trace_instruction: false
}

Put this one in a file "disable_trace.rb" and load it before loading your program (you can't load this in the same script because it's too late, your program is already compiled to bytecode).

$ ruby -r./disable_trace assignment.rb
Calculating -------------------------------------
 Parallel Assignment   149.276k i/100ms
Sequential Assignment
                       149.712k i/100ms
-------------------------------------------------
 Parallel Assignment      7.963M (± 5.2%) i/s -     39.707M
Sequential Assignment
                          7.947M (± 5.2%) i/s -     39.674M

Comparison:
 Parallel Assignment:  7962929.1 i/s
Sequential Assignment:  7946966.7 i/s - 1.00x slower

This is how your bytecode actually looks like when you disable emitting "trace" instruction:

puts RubyVM::InstructionSequence.compile("a = 1;\nb = 2;", nil, nil, nil, trace_instruction: false).disasm
== disasm: #<ISeq:<compiled>@<compiled>>================================
local table (size: 3, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 3] a          [ 2] b
0000 putobject_OP_INT2FIX_O_1_C_                                      (   1)
0001 setlocal_OP__WC__0 3
0003 putobject        2                                               (   2)
0005 dup
0006 setlocal_OP__WC__0 2
0008 leave

See? No trace instructions.

Summary:

Also, small digression. It would be better to educate people WHY and WHEN something can be slower/faster rather than making such a strong statements like "X is faster than Y".

nateberkopec commented 8 years ago

Thank you for this! I never considered the impact of the trace instruction on microbenchmarks, thank you for putting all of this research together.

At this microbenchmarking scale, perhaps we should be disabling trace instructions more often.

Also, small digression. It would be better to educate people WHY and WHEN something can be slower/faster rather than making such a strong statements like "X is faster than Y".

I agree that this repository could do a better job of this.

I wonder why nobody broaden this topic.

Just a couple of speed freaks working on this in our spare time mate :+1:

nateberkopec commented 8 years ago

After thinking about this some more, I'm not sure why we should disable trace instructions when 99% of production Rubies will have them enabled. Shouldn't we be comparing 1-1 against most people's Ruby production settings? @radarek your thoughts?

radarek commented 8 years ago

I woulnd't disable it too. The point of my comment was that we shouldn't tell people that sequential assignment is slower than parallel. If we would do that then we should tell them also that code in multiple lines is slower than the same code in one line. It would be stupid of course ;-). I think this benchmark should be simply deleted. I don't think any real world code suffers from parallel vs sequential assignment problem or "trace" bytecode instrunction.

radarek commented 8 years ago

One more note about this benchmark. When I did this research I also checked how many local variables are used in "average" method (I used ripper library to do that). I analyzed all 774 gems from ~/.rbenv/gems/2.2.0/gems/ directory (over 40k files). And here is the histogram:

# K: V (K - number of variables, V - number of methods that use K local variables)
 0: 151113
 1: 43408
 2: 18114
 3: 8435
 4: 3902
 5: 2271
 6: 1104
 7: 676
 8: 399
 9: 275
10: 160
11: 115
12: 91
13: 55
14: 22
15: 40
16: 22
17: 15
18: 25
19: 15
20: 4
21: 10
22: 3
23: 4
24: 4
25: 2
26: 4
27: 6
28: 1
29: 4
30: 14
31: 4
35: 1
36: 2

Even if we remove 0 entry (we don't care about method which don't use local variables) we see that 93% of a methods use 4 or less variables. Re-running original benchmark with only 4 local variables I get result:

Comparison:
 Parallel Assignment:  6985531.4 i/s
Sequential Assignment:  6720164.3 i/s - 1.04x slower

(for 8 local variables I get "1.12x slower").

nateberkopec commented 8 years ago

If we would do that then we should tell them also that code in multiple lines is slower than the same code in one line. It would be stupid of course ;-).

Right, now I understand.

I think this benchmark should be simply deleted. I don't think any real world code suffers from parallel vs sequential assignment problem or "trace" bytecode instrunction.

That's true. I think you make a good case, the benchmark should be removed.

We need a better policy on what constitutes a big enough difference to matter, especially, as you've pointed out, several of the benches suffer from simply measuring unrelated things like assignment, number of lines, etc.

mateusdeap commented 1 year ago

Given the arguments put forth, I agree this benchmark should be removed. I will submit a PR shortly.

As for what gets into the repo, we also need to make things more uniform so that the benchmarks are more reliable.

It might be a while before that since we're trying to first clean out the repository of old issues.