Closed mrThe closed 2 years ago
Entirely depends on the speed of the sort:
DATA = [*0..1_000_000].shuffle
# Calculating -------------------------------------
# find 1.000 i/100ms
# sort.bsearch 1.000 i/100ms
# bsearch 72.382k i/100ms
# -------------------------------------------------
# find 16.799 (± 0.0%) i/s - 84.000
# sort.bsearch 6.734 (± 0.0%) i/s - 34.000
# bsearch 1.033M (± 1.8%) i/s - 5.212M
# Comparison:
# bsearch: 1033428.3 i/s
# find: 16.8 i/s - 61518.15x slower
# sort.bsearch: 6.7 i/s - 153468.41x slower
Binary searching only works on sorted data anyway, as mentioned in the readme.
@Arcovion
require 'benchmark/ips'
DATA = [*0..1_000_000]
DATA_RANDOM = [*0..1_000_000].shuffle
Benchmark.ips do |x|
x.report('data') { DATA.sort }
x.report('data_random') { DATA_RANDOM.sort }
x.compare!
end
Calculating -------------------------------------
data 8.000 i/100ms
data_random 1.000 i/100ms
-------------------------------------------------
data 91.713 (±10.9%) i/s - 456.000
data_random 4.554 (± 0.0%) i/s - 23.000
Comparison:
data: 91.7 i/s
data_random: 4.6 i/s - 20.14x slower
So, if
require 'benchmark/ips'
DATA = [*0..1_000_000]
DATA_RANDOM = [*0..1_000_000].shuffle
def slowest(data)
data.find { |number| number > 77_777_777 }
end
def slow(data)
data.sort.bsearch { |number| number > 77_777_777 }
end
def fastest(data)
data.bsearch { |number| number > 77_777_777 }
end
Benchmark.ips do |x|
x.report('find') { slowest(DATA) }
x.report('sort.bsearch') { slow(DATA) }
x.report('bsearch') { fastest(DATA) }
x.compare!
end
Benchmark.ips do |x|
x.report('find') { slowest(DATA_RANDOM) }
x.report('sort.find') { DATA_RANDOM.sort.find { |number| number > 77_777_777 } }
x.report('sort.bsearch') { slow(DATA_RANDOM) }
x.compare!
end
Then
✗ ruby -v code/array/bsearch-vs-find.rb
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin13]
Calculating -------------------------------------
find 1.000 i/100ms
sort.bsearch 7.000 i/100ms
bsearch 37.411k i/100ms
-------------------------------------------------
find 11.322 (±17.7%) i/s - 56.000
sort.bsearch 90.577 (±14.4%) i/s - 448.000
bsearch 712.016k (± 7.1%) i/s - 3.554M
Comparison:
bsearch: 712015.6 i/s
sort.bsearch: 90.6 i/s - 7860.84x slower
find: 11.3 i/s - 62886.82x slower
Calculating -------------------------------------
find 1.000 i/100ms
sort.find 1.000 i/100ms
sort.bsearch 1.000 i/100ms
-------------------------------------------------
find 11.216 (±17.8%) i/s - 55.000
sort.find 3.358 (± 0.0%) i/s - 17.000
sort.bsearch 4.543 (± 0.0%) i/s - 23.000
Comparison:
find: 11.2 i/s
sort.bsearch: 4.5 i/s - 2.47x slower
sort.find: 3.4 i/s - 3.34x slower
tl;dr: find is ~2x faster on unsorted array.
And, looks like my pull-request is useless in this case. Or, maybe anyone have an idea how can i improve it?
77_777_777
isn't between 0..1_000_000
so the entire array is iterated over. The other two are slower due to sorting.
require 'benchmark/ips'
DATA = Array(0..1_000_000)
REVERSE_DATA = Array(0..1_000_000).reverse
def bsearch
DATA.bsearch{ |number| number > 777_777 }
end
def find
DATA.find{ |number| number > 777_777 }
end
def reverse_find
REVERSE_DATA.find{ |number| number > 777_777 }
end
def index
DATA[DATA.index{ |number| number > 777_777 }]
end
def reverse_index
REVERSE_DATA[REVERSE_DATA.index{ |number| number > 777_777 }]
end
def rindex
DATA[DATA.rindex{ |number| number > 777_777 }]
end
p bsearch, find, reverse_find, index, reverse_index, rindex
Benchmark.ips do |bm|
bm.report('bsearch'){ bsearch }
bm.report('find'){ find }
bm.report('reverse_find'){ reverse_find }
bm.report('index'){ index }
bm.report('reverse_index'){ reverse_index }
bm.report('rindex'){ rindex }
bm.compare!
end
This gives some meaningful results:
777778
777778
1000000
777778
1000000
1000000
Calculating -------------------------------------
bsearch 68.426k i/100ms
find 2.000 i/100ms
reverse_find 51.144k i/100ms
index 3.000 i/100ms
reverse_index 159.802k i/100ms
rindex 159.604k i/100ms
-------------------------------------------------
bsearch 938.018k (± 1.8%) i/s - 4.721M
find 21.614 (± 0.0%) i/s - 110.000
reverse_find 697.638k (± 1.6%) i/s - 3.529M
index 31.105 (± 0.0%) i/s - 156.000
reverse_index 5.299M (± 1.1%) i/s - 26.527M
rindex 5.381M (± 1.4%) i/s - 26.973M
Comparison:
rindex: 5380568.4 i/s
reverse_index: 5299118.5 i/s - 1.02x slower
bsearch: 938017.9 i/s - 5.74x slower
reverse_find: 697638.3 i/s - 7.71x slower
index: 31.1 i/s - 172982.81x slower
find: 21.6 i/s - 248943.46x slower
The first number in REVERSE_DATA
is 1_000_000
, greater than 777_777
, so reverse_find
should immediately return that. bsearch
somehow manages to be faster, however rindex
wins doing the same thing as reverse_find
. The really slow ones are iterating 777777 times until they find a match.
Basically, if the data is sorted then binary searching is always fastest. Otherwise, it just matters where the element you want is positioned in the array. If you're indexing the data a lot, it's worth sorting once in advance of many binary searches.
TL;DR: No use comparing bsearch
and find
as it depends whether the data is sorted. find
vs index
is worth comparing though - I might make a PR for that myself.
I would agree that comparing find
and bsearch
is not very useful actually, given that bsearch
only works for sorted array.
Maybe we should also remove that from the README? @JuanitoFatas?
Agree that find
and index
would be more interesting.
With
Array#sort
it looks not so fast, but still faster than find.