Closed nateberkopec closed 5 years ago
May want to include a warning that for most other methods (each and other iterators, union/difference), Array is faster.
I'm not sure about most of the methods. But creation of Set
s is minimum 5 times slower. That's why union/merge will be 5 times slower as well. So, I guess it worth to mention that, yeah ;))
Under the hood Set
is implemented as a Hash
; how about including an example of this too?
require 'benchmark/ips'
require 'set'
ARRAY = [*1..1000].shuffle
SET = ARRAY.to_set
SORTED_SET = SortedSet.new(ARRAY)
HASH = ARRAY.each_with_object(Hash.new){ |i, h| h[i] = true }
def slow
ARRAY.include?(750)
end
def fast
SET.include?(750)
end
def also_fast
SORTED_SET.include?(750)
end
def fastest
HASH.include?(750)
end
Benchmark.ips do |bm|
bm.report('Hash#include?') { fastest }
bm.report('Set#include?') { fast }
bm.report('SortedSet#include?') { also_fast }
bm.report('Array#include?') { slow }
bm.compare!
end
Calculating -------------------------------------
Hash#include? 86.889k i/100ms
Set#include? 78.141k i/100ms
SortedSet#include? 108.497k i/100ms
Array#include? 25.663k i/100ms
-------------------------------------------------
Hash#include? 8.752M (± 3.0%) i/s - 43.705M
Set#include? 6.349M (± 2.6%) i/s - 31.725M
SortedSet#include? 6.385M (± 2.3%) i/s - 31.898M
Array#include? 290.163k (± 2.6%) i/s - 1.463M
Comparison:
Hash#include?: 8752135.4 i/s
SortedSet#include?: 6384910.8 i/s - 1.37x slower
Set#include?: 6348659.4 i/s - 1.38x slower
Array#include?: 290162.9 i/s - 30.16x slower
I think it depends on whether or not you think people will actually be swapping Arrays with Hashes for the include?
performance. Set
s are conceptually similar to Arrays, but Hash
es of course are not. I can't see myself ever using a Hash with all true
values in place of an Array, that just feels...sinful.
@Arcovion is absolutely right. Set is implemented using Hash under the hood. But I don't think we should add Hash#include?
in here. It's a bit out of scope of this particular case IMO.
@Arcovion I think it's better to add info that Set is utilizing Hash internally rather than add Hash#include?
in here. Because sets can be implemented in more than one way with Hash. ;))
Depends whether micro-optimizations are ok to be included here, could also add gems to compare against which implement prefix trees or whatnot.
Indeed, it should at least be noted somewhere that there are better algorithms for arrays of simple strings and that Set
is just a Hash
with true
values under the hood.
There probably needs to be a whole new section for explanations where people can write code snippets, see algorithm examples or gems for people to discover/use, examine what micro-optimisations are available for certain problems etc.
May want to include a warning that for most other methods (each and other iterators, union/difference), Array is faster.