fastruby / fast-ruby

:dash: Writing Fast Ruby :heart_eyes: -- Collect Common Ruby idioms.
https://github.com/fastruby/fast-ruby
5.67k stars 376 forks source link

Set#include is faster than Array#include #85

Closed nateberkopec closed 5 years ago

nateberkopec commented 8 years ago

May want to include a warning that for most other methods (each and other iterators, union/difference), Array is faster.

ixti commented 8 years ago

May want to include a warning that for most other methods (each and other iterators, union/difference), Array is faster.

I'm not sure about most of the methods. But creation of Sets is minimum 5 times slower. That's why union/merge will be 5 times slower as well. So, I guess it worth to mention that, yeah ;))

Arcovion commented 8 years ago

Under the hood Set is implemented as a Hash; how about including an example of this too?

require 'benchmark/ips'
require 'set'

ARRAY = [*1..1000].shuffle
SET = ARRAY.to_set
SORTED_SET = SortedSet.new(ARRAY)
HASH = ARRAY.each_with_object(Hash.new){ |i, h| h[i] = true }

def slow
  ARRAY.include?(750)
end

def fast
  SET.include?(750)
end

def also_fast
  SORTED_SET.include?(750)
end

def fastest
  HASH.include?(750)
end

Benchmark.ips do |bm|
  bm.report('Hash#include?') { fastest }
  bm.report('Set#include?') { fast }
  bm.report('SortedSet#include?') { also_fast }
  bm.report('Array#include?') { slow }
  bm.compare!
end
Calculating -------------------------------------
       Hash#include?    86.889k i/100ms
        Set#include?    78.141k i/100ms
  SortedSet#include?   108.497k i/100ms
      Array#include?    25.663k i/100ms
-------------------------------------------------
       Hash#include?      8.752M (± 3.0%) i/s -     43.705M
        Set#include?      6.349M (± 2.6%) i/s -     31.725M
  SortedSet#include?      6.385M (± 2.3%) i/s -     31.898M
      Array#include?    290.163k (± 2.6%) i/s -      1.463M

Comparison:
       Hash#include?:  8752135.4 i/s
  SortedSet#include?:  6384910.8 i/s - 1.37x slower
        Set#include?:  6348659.4 i/s - 1.38x slower
      Array#include?:   290162.9 i/s - 30.16x slower
nateberkopec commented 8 years ago

I think it depends on whether or not you think people will actually be swapping Arrays with Hashes for the include? performance. Sets are conceptually similar to Arrays, but Hashes of course are not. I can't see myself ever using a Hash with all true values in place of an Array, that just feels...sinful.

ixti commented 8 years ago

@Arcovion is absolutely right. Set is implemented using Hash under the hood. But I don't think we should add Hash#include? in here. It's a bit out of scope of this particular case IMO.

ixti commented 8 years ago

@Arcovion I think it's better to add info that Set is utilizing Hash internally rather than add Hash#include? in here. Because sets can be implemented in more than one way with Hash. ;))

Arcovion commented 8 years ago

Depends whether micro-optimizations are ok to be included here, could also add gems to compare against which implement prefix trees or whatnot.

Indeed, it should at least be noted somewhere that there are better algorithms for arrays of simple strings and that Set is just a Hash with true values under the hood.

There probably needs to be a whole new section for explanations where people can write code snippets, see algorithm examples or gems for people to discover/use, examine what micro-optimisations are available for certain problems etc.