Closed gbalbuena closed 8 years ago
Definitely need to add compact
Here you have it, @nateberkopec good catch!
$ ruby code/array/array-reject-vs-delete-vs-delete_if-vs-subtract-vs-compact.rb
Warming up --------------------------------------
A.compact 65.961k i/100ms
A.reject{|o|o.nil?} 1.771k i/100ms
A.delete(nil) 144.815k i/100ms
A.delete_if(&:nil?) 138.235k i/100ms
A - [nil] 89.647k i/100ms
Calculating -------------------------------------
A.compact 6.211M (± 8.5%) i/s - 30.804M
A.reject{|o|o.nil?} 6.625M (±11.1%) i/s - 32.415M
A.delete(nil) 7.677M (± 7.9%) i/s - 38.231M
A.delete_if(&:nil?) 6.587M (± 6.4%) i/s - 32.900M
A - [nil] 1.930M (± 5.2%) i/s - 9.682M
Comparison:
A.delete(nil): 7677121.9 i/s
A.reject{|o|o.nil?}: 6625399.5 i/s - same-ish: difference falls within error
A.delete_if(&:nil?): 6586659.2 i/s - 1.17x slower
A.compact: 6211063.3 i/s - 1.24x slower
A - [nil]: 1929668.5 i/s - 3.98x slower
There are some problems with this:
delete
mutate ARRAY
, invalidating the benchmark as the following runs just process an empty array.±11.1%
which leads incorrect rankings (reject
should not fall within error).nil
, rather than randomly interspersing other values.Symbol#to_proc
is used for delete_if
but not for reject
, which may have some small impact.This should be more accurate, hopefully I didn't overlook anything else:
require 'benchmark/ips'
ARRAY = Array.new(100_000){ rand(2).nonzero? }
ARRAY1 = ARRAY.dup
ARRAY2 = ARRAY.dup
ARRAY3 = ARRAY.dup
ARRAY4 = ARRAY.dup
ARRAY5 = ARRAY.dup
ARRAY6 = ARRAY.dup
ARRAY7 = ARRAY.dup
ARRAY8 = ARRAY.dup
def compact!
ARRAY1.compact!
end
def compact
ARRAY2.compact
end
def reject!
ARRAY3.reject! { |o| o.nil? }
end
def reject
ARRAY4.reject { |o| o.nil? }
end
def delete
ARRAY5.delete(nil)
end
def delete_if
ARRAY6.delete_if { |o| o.nil? }
end
def subtract
ARRAY7 - [nil]
end
def grep_v
ARRAY8.grep_v(nil)
end
Benchmark.ips do |x|
x.report('A.compact!') { compact! }
x.report('A.compact') { compact }
x.report('A.reject{|o|o.nil?}') { reject }
x.report('A.reject!{|o|o.nil?}') { reject! }
x.report('A.delete(nil)') { delete }
x.report('A.delete_if{|o|o.nil?}') { delete_if }
x.report('A - [nil]') { subtract }
x.report('A.grep_v(nil)') { grep_v }
x.compare!
end
Warming up --------------------------------------
A.compact! 3.310k i/100ms
A.compact 335.000 i/100ms
A.reject{|o|o.nil?} 15.000 i/100ms
A.reject!{|o|o.nil?} 47.000 i/100ms
A.delete(nil) 30.000 i/100ms
A.delete_if{|o|o.nil?}
46.000 i/100ms
A - [nil] 43.000 i/100ms
A.grep_v(nil) 14.000 i/100ms
Calculating -------------------------------------
A.compact! 35.754k (± 5.6%) i/s - 178.740k
A.compact 2.346k (± 4.9%) i/s - 11.725k
A.reject{|o|o.nil?} 160.969 (± 3.1%) i/s - 810.000
A.reject!{|o|o.nil?} 461.897 (± 5.8%) i/s - 2.679k
A.delete(nil) 304.818 (± 4.9%) i/s - 1.530k
A.delete_if{|o|o.nil?}
462.620 (± 5.0%) i/s - 2.346k
A - [nil] 424.620 (± 4.7%) i/s - 2.150k
A.grep_v(nil) 151.899 (± 3.3%) i/s - 770.000
Comparison:
A.compact!: 35754.1 i/s
A.compact: 2346.0 i/s - 15.24x slower
A.delete_if{|o|o.nil?}: 462.6 i/s - 77.29x slower
A.reject!{|o|o.nil?}: 461.9 i/s - 77.41x slower
A - [nil]: 424.6 i/s - 84.20x slower
A.delete(nil): 304.8 i/s - 117.30x slower
A.reject{|o|o.nil?}: 161.0 i/s - 222.12x slower
A.grep_v(nil): 151.9 i/s - 235.38x slower
@Arcovion Thanks for your review, will do the changes, this was something I found in my daily work and this was the repo to document my founds.
@Arcovion @ixti
Results after code review:
$ ruby code/array/array-reject-vs-delete-vs-delete_if-vs-subtract-vs-compact.rb
Warming up --------------------------------------
A.compact! 2.904k i/100ms
A.compact 135.000 i/100ms
A.reject(&:nil?) 16.000 i/100ms
A.reject!(&:nil?) 34.000 i/100ms
A.delete(nil) 33.000 i/100ms
A.delete_if{|o|o.nil?}
33.000 i/100ms
A - [nil] 32.000 i/100ms
A.grep_v(nil) 13.000 i/100ms
Calculating -------------------------------------
A.compact! 30.704k (± 4.9%) i/s - 153.912k
A.compact 1.393k (± 3.9%) i/s - 7.020k
A.reject(&:nil?) 160.195 (± 3.7%) i/s - 800.000
A.reject!(&:nil?) 336.717 (± 4.5%) i/s - 1.700k
A.delete(nil) 332.311 (± 3.3%) i/s - 1.683k
A.delete_if{|o|o.nil?}
342.727 (± 4.7%) i/s - 1.716k
A - [nil] 326.369 (± 4.0%) i/s - 1.632k
A.grep_v(nil) 131.573 (± 5.3%) i/s - 663.000
Comparison:
A.compact!: 30704.3 i/s
A.compact: 1392.8 i/s - 22.05x slower
A.delete_if{|o|o.nil?}: 342.7 i/s - 89.59x slower
A.reject!(&:nil?): 336.7 i/s - 91.19x slower
A.delete(nil): 332.3 i/s - 92.40x slower
A - [nil]: 326.4 i/s - 94.08x slower
A.reject(&:nil?): 160.2 i/s - 191.67x slower
A.grep_v(nil): 131.6 i/s - 233.36x slower
I'm thinking about (&:nil?)
instead of {|o| o.nil?}
that will force a to_proc
call, would it be better to use the traditional notation for that and skips the casting
Thoughts?
Thanks! LGTM. Would you mind adding the benchmark results to README.md like the others?
After thinking a bit, I'm not sure how "good" comparison with bang versions are.
What I mean is in case of [1, 2, 3].compact!
versus [1, 2, 3].compact
we don't actually test anything but creation of new array in case of bang-less version.
Perhaps we could do two separate benchmarks in the same file - one comparing the bang/modify-in-place versions, one without?
I was thinking about that too. But after first iteration compact!
will be doing nothing...
@ixti Oh wow, I completely forgot that. For some reason I was thinking that only mattered between each run of #report
, so I made 8 arrays... lol.
How about:
require 'benchmark/ips'
ARRAY = Array.new(1000){ rand(2).nonzero? }
def compact
ARRAY.compact
end
def reject
ARRAY.reject(&:nil?)
end
def delete
ARRAY.dup.delete(nil)
end
def delete_if
ARRAY.dup.delete_if(&:nil?)
end
def subtract
ARRAY - [nil]
end
def grep_v
ARRAY.grep_v(nil)
end
Benchmark.ips do |x|
x.report('A.compact') { compact }
x.report('A.reject(&:nil?)') { reject }
x.report('A.dup.delete(nil)') { delete }
x.report('A.dup.delete_if(&:nil?)') { delete_if }
x.report('A - [nil]') { subtract }
x.report('A.grep_v(nil)') { grep_v }
x.compare!
end
Warming up --------------------------------------
A.compact 32.003k i/100ms
A.reject(&:nil?) 1.708k i/100ms
A.dup.delete(nil) 2.285k i/100ms
A.dup.delete_if(&:nil?)
1.566k i/100ms
A - [nil] 3.983k i/100ms
A.grep_v(nil) 1.452k i/100ms
Calculating -------------------------------------
A.compact 447.529k (± 7.3%) i/s - 2.240M
A.reject(&:nil?) 17.602k (± 5.2%) i/s - 88.816k
A.dup.delete(nil) 24.027k (± 5.7%) i/s - 121.105k
A.dup.delete_if(&:nil?)
17.840k (± 3.4%) i/s - 89.262k
A - [nil] 40.462k (± 6.0%) i/s - 207.116k
A.grep_v(nil) 15.068k (± 3.7%) i/s - 75.504k
Comparison:
A.compact: 447529.4 i/s
A - [nil]: 40461.6 i/s - 11.06x slower
A.dup.delete(nil): 24027.1 i/s - 18.63x slower
A.dup.delete_if(&:nil?): 17840.2 i/s - 25.09x slower
A.reject(&:nil?): 17601.9 i/s - 25.43x slower
A.grep_v(nil): 15068.4 i/s - 29.70x slower
Yeah. Looks good.
Hi guys - I have two remarks:
#dup
for #delete
and #delete_if
too. That's not fair.Array
in place and the second creates a complete new Array
(with different content). I think that should not be mixed.I tend to agree that probably it worth to avoid #dup
+ (#delete
|#delete_if
) from the benchmarks.
require 'benchmark/ips'
ARRAY = Array.new(1000){ rand(2).nonzero? }
def compact
ARRAY.compact
end
def reject
ARRAY.reject(&:nil?)
end
def subtract
ARRAY - [nil]
end
Benchmark.ips do |x|
x.report('A.compact') { compact }
x.report('A.reject(&:nil?)') { reject }
x.report('A - [nil]') { subtract }
x.compare!
end
devbox:fast-ruby GBalbuena$ ruby code/array/compact-vs-reject-vs-subtract.rb
Warming up --------------------------------------
A.compact 33.598k i/100ms
A.reject(&:nil?) 1.596k i/100ms
A - [nil] 3.281k i/100ms
Calculating -------------------------------------
A.compact 346.781k (±20.0%) i/s - 1.680M
A.reject(&:nil?) 15.690k (± 6.3%) i/s - 78.204k
A - [nil] 33.054k (± 4.6%) i/s - 167.331k
Comparison:
A.compact: 346781.0 i/s
A - [nil]: 33054.3 i/s - 10.49x slower
A.reject(&:nil?): 15689.7 i/s - 22.10x slower
require 'benchmark/ips'
ARRAY = Array.new(1000){ rand(2).nonzero? }
ARRAY1 = ARRAY.dup
ARRAY2 = ARRAY.dup
ARRAY3 = ARRAY.dup
def grep_v
ARRAY1.grep_v(nil)
end
def delete
ARRAY2.delete(nil)
end
def delete_if
ARRAY3.delete_if(&:nil?)
end
Benchmark.ips do |x|
x.report('A.grep_v(nil)') { grep_v }
x.report('A.delete(nil)') { delete }
x.report('A.delete_if(&:nil?)') { delete_if }
x.compare!
end
devbox:fast-ruby GBalbuena$ ruby code/array/grep-vs-delete-vs-delete_if.rb
Warming up --------------------------------------
A.grep_v(nil) 1.327k i/100ms
A.delete(nil) 3.190k i/100ms
A.delete_if(&:nil?) 3.383k i/100ms
Calculating -------------------------------------
A.grep_v(nil) 13.916k (± 2.3%) i/s - 70.331k
A.delete(nil) 32.228k (± 3.2%) i/s - 162.690k
A.delete_if(&:nil?) 35.120k (± 3.4%) i/s - 175.916k
Comparison:
A.delete_if(&:nil?): 35119.5 i/s
A.delete(nil): 32227.6 i/s - 1.09x slower
A.grep_v(nil): 13916.3 i/s - 2.52x slower
@gbalbuena I'm sorry you've had to make so many changes for what was such a simple PR.
Thing is, we have no way to test in-place methods without #dup
. The second benchmark doesn't work; as @ixti pointed out ARRAY.delete(nil)
will remove all nils on the first iteration making the test useless.
I wondered if we could test stuff by adding #dup
to every test (even if the test doesn't use in-place methods) as if it would cancel out. But my local testing shows it sort of equalises the results and removes a lot of the variance, leading me to believe there is a large overhead for #dup
.
What we know for sure is that #compact!
will always be the fastest way to remove nils, but why focus on nils?
I tried making a generic benchmark that works for removing any item, for me the most surprising result in all these benchmarks is the speed of subtracting with ARRAY - [item]
.
If you do just want to delete items from an array, I honestly can't tell if ARRAY.delete(item)
is faster than ARRAY - [item]
, I have to just assume that if you're not using #dup
and don't mind mutation then ARRAY.delete(item)
is faster.
Most people would probably use ARRAY.reject{ |x| x == item }
, but did you know that ARRAY.dup.reject!{ |x| x == item }
is faster if you benchmark it this way? Madness.
We also saw that ARRAY.dup.delete(nil)
is faster than ARRAY.reject(&nil?)
, that sort of information is still nice to keep on the benchmark.
Anyway, I'm still just going to use ARRAY.grep_v(item)
...
What have we learned here? Really? :D
I think having a benchmark for removing nils is a bit pointless to be honest, #compact!
is C code written for that exact purpose. As far as removing any one item from an array, my results are kinda inconclusive. We have no way of testing in-place methods at all.
Anyone else got any thoughts or ideas? Help me out...
nils
was the use case that led me to the findings, that can be changed to a different kind of benchmark and target different cases, for example testing functionals methods over array for example: select, collect, map, reduce, etc.
But I think the cleanup of nils
in an array case is a common problem, so if its out of the scope of the benchmarks lets focus on other thing and update the PR.
I think having a benchmark for removing nils is a bit pointless to be honest, #compact! is C code written for that exact purpose.
To be fair, a good portion of the benchmarks we have with Enumerable boil down to "use this C-optimized method instead of compact!
.
It's also good to not rely too heavily on MRI's internals. Those assumptions naturally do not hold for other Ruby implementations and possibly might not hold for future versions of MRI.
I was disappointed to see this closed without resolution, so I created a gem to help with these kind of benchmarks: benchmark-inputs. (Feedback welcome!)
Here is the benchmark code comparing all cases:
require 'benchmark/inputs'
NO_NILS = [1] * 100
SOME_NILS = [*1..9, nil] * 10
JUST_NIL = [nil]
Benchmark.inputs([NO_NILS, SOME_NILS]) do |x|
x.dup_inputs = true # enabled for destructive operations
x.report('A.compact') {|a| a.compact }
x.report('A.compact!') {|a| a.compact! }
x.report('A.reject(&:nil?)') {|a| a.reject(&:nil?) }
x.report('A.reject!(&:nil?)') {|a| a.reject!(&:nil?) }
x.report('A.delete(nil)') {|a| a.delete(nil) }
x.report('A.delete_if(&:nil?)') {|a| a.delete_if(&:nil?) }
x.report('A - [nil]'){|a| a - JUST_NIL }
#x.report('A.grep_v(nil)'){|a| a.grep_v(nil) } # only in Ruby 2.3+
x.compare!
end
And here are the results I get:
A.compact
955052.5 i/s (±0.83%)
A.compact!
1415897.0 i/s (±3.93%)
A.reject(&:nil?)
73797.2 i/s (±0.30%)
A.reject!(&:nil?)
108205.2 i/s (±0.29%)
A.delete(nil)
60839.9 i/s (±0.97%)
A.delete_if(&:nil?)
106545.5 i/s (±0.46%)
A - [nil]
103218.4 i/s (±0.41%)
Comparison:
A.compact!: 1415897.0 i/s
A.compact: 955052.5 i/s - 1.48x slower
A.reject!(&:nil?): 108205.2 i/s - 13.09x slower
A.delete_if(&:nil?): 106545.5 i/s - 13.29x slower
A - [nil]: 103218.4 i/s - 13.72x slower
A.reject(&:nil?): 73797.2 i/s - 19.19x slower
A.delete(nil): 60839.9 i/s - 23.27x slower
The standard deviation of compact!
is surprising, but I think tolerable? The worse performance of all the &:nil?
methods makes sense, because a Proc
is created on each call. (I thought about storing :nil?.to_proc
in a constant, but that wouldn't align with real-world usage.) The astonishing observation is how slow delete(nil)
is. So slow that I question its correctness, but it is consistent, and using benchmark-ips shows a similar result (accounting for the overhead of dup
):
Benchmark.ips do |x|
x.report('A.dup.compact!') { SOME_NILS.dup.compact! }
x.report('A.dup.delete(nil)') { SOME_NILS.dup.delete(nil) }
x.compare!
end
Warming up --------------------------------------
A.dup.compact! 36.997k i/100ms
A.dup.delete(nil) 5.451k i/100ms
Calculating -------------------------------------
A.dup.compact! 592.561k (± 3.4%) i/s - 2.960M in 5.000114s
A.dup.delete(nil) 56.339k (± 1.8%) i/s - 283.452k in 5.032793s
Comparison:
A.dup.compact!: 592561.2 i/s
A.dup.delete(nil): 56339.1 i/s - 10.52x slower
When you need to cleanup the array and extract all the nils there are at least 4 different ways
These are the results: