fastruby / fast-ruby

:dash: Writing Fast Ruby :heart_eyes: -- Collect Common Ruby idioms.
https://github.com/fastruby/fast-ruby
5.67k stars 376 forks source link

Enumerable#sort_by is not always faster than #sort #120

Open monfresh opened 7 years ago

monfresh commented 7 years ago

Per the Ruby 2.4.0 docs:

The current implementation of sort_by generates an array of tuples containing the original collection element and the mapped value. This makes sort_by fairly expensive when the keysets are simple.

Here's a concrete example showing sort to be 2.70x faster in Ruby 1.9.3, 2.3.3 and 2.4.0:

require 'benchmark/ips'

Benchmark.ips do |x|
  x.time = 5
  x.warmup = 2

  ARRAY = %w{apple pear fig}

  x.report("sort_by") do
    ARRAY.sort_by(&:length)
  end

  x.report("sort") do
    ARRAY.sort { |a, b| a.length <=> b.length}
  end

  x.compare!
end
Warming up --------------------------------------
             sort_by    56.348k i/100ms
                sort   111.946k i/100ms
Calculating -------------------------------------
             sort_by    635.646k (±16.1%) i/s -      3.099M in   5.074036s
                sort      1.713M (±16.3%) i/s -      8.284M in   5.023904s

Comparison:
                sort:  1713232.1 i/s
             sort_by:   635645.9 i/s - 2.70x slower
mblumtritt commented 7 years ago

In your sample you sort the ARRAY twice - it's already sorted when the second sample runs. The results may be therefore wrong…

monfresh commented 7 years ago

I don't follow. ARRAY is not mutated. If you call ARRAY.sort_by(&:length), and then call ARRAY again, you get the original ARRAY. You can verify this by adding .freeze to the end of the ARRAY assignment (ARRAY = %w(apple pear fig).freeze)

I get the same results when I change the order of the samples:

Warming up --------------------------------------
                sort   121.953k i/100ms
             sort_by    59.477k i/100ms
Calculating -------------------------------------
                sort      1.898M (± 3.8%) i/s -      9.512M in   5.018772s
             sort_by    704.695k (±11.5%) i/s -      3.450M in   5.002361s

Comparison:
                sort:  1898186.9 i/s
             sort_by:   704694.9 i/s - 2.69x  slower
monfresh commented 7 years ago

The code example in this repo also does the same thing: it defines the array to be used in the samples as a constant: https://github.com/JuanitoFatas/fast-ruby/blob/master/code/enumerable/sort-vs-sort_by.rb

avellable commented 7 years ago

@monfresh Thanks for pointing this out. There are two ways to resolve into the current example,

  1. Remove the example
  2. Based on the API documentation of Ruby, we should update the benchmark report to have more details.

@JuanitoFatas Any comments about above?

mblumtritt commented 7 years ago

@monfresh Sorry - I used in my local sample sort! and sort_by! and did ignore your code details. Sorry …

abuisman commented 5 years ago

I just ran into a situation where sort_by is not faster, per se, in fact 2.95x slower. But it has something to do with the number of items in the array.

 require 'benchmark/ips'

          records = SomeModel.group(:some_column).sum(:value)

          Benchmark.ips do |x|
            x.report("with #sort") do |n|
              n.times { records.sort { |x, y| y[1] <=> x[1] } }
            end

            x.report("with #sort_by") do |n|
              n.times { records.sort_by(&:last) }
            end

            # Compare the iterations per second of the various reports!
            x.compare!
          end

Yields

Warming up --------------------------------------
          with #sort    95.145k i/100ms
       with #sort_by    41.329k i/100ms
Calculating -------------------------------------
          with #sort      1.350M (±17.9%) i/s -      6.280M in   5.040178s
       with #sort_by    457.236k (±27.8%) i/s -      2.025M in   5.034366s

Comparison:
          with #sort:  1350318.7 i/s
       with #sort_by:   457235.7 i/s - 2.95x  slower

This is with 2 items in the result. When I bump this to 100:

Warming up --------------------------------------
          with #sort     1.029k i/100ms
       with #sort_by     1.365k i/100ms
Calculating -------------------------------------
          with #sort     11.344k (±14.4%) i/s -     55.566k in   5.046672s
       with #sort_by     14.750k (±14.4%) i/s -     72.345k in   5.023649s

Comparison:
       with #sort_by:    14750.3 i/s
          with #sort:    11344.5 i/s - same-ish: difference falls within error

Apparently there is a correlation between the number of items and the performance difference, which makes sense of course. Adding items to the array brings us closer to the 'true' performance difference between sort and sort_by.

So lets bump to 1000:

Calculating -------------------------------------
          with #sort    667.572  (±16.6%) i/s -      3.245k in   5.017742s
       with #sort_by    953.283  (±15.1%) i/s -      4.692k in   5.043402s

Comparison:
       with #sort_by:      953.3 i/s
          with #sort:      667.6 i/s - 1.43x  slower

So at 1000 items sort_by becomes faster.

I don't know how we'd go about this, but adding a note about the expected number of items in the array would make sense.

luxangel777 commented 4 years ago

🙏