Comparing array subset checks

fastruby / fast-ruby

:dash: Writing Fast Ruby :heart_eyes: -- Collect Common Ruby idioms.

https://github.com/fastruby/fast-ruby

5.67k stars 376 forks source link

Comparing array subset checks #125

Open gabteles opened 7 years ago

mblumtritt commented 7 years ago

Your code above is massively dependent to used sample arrays. Using this samples

ARRAY1 = [*(400..500)].freeze
ARRAY2 = [*(1..500)].freeze

will change your results completely. I would prefer to use a more real-world sample for the test to avoid misleadings…

Arcovion commented 7 years ago

Maybe (ARRAY1 & ARRAY2).size == ARRAY1.size is faster - also try the set library; convert both #to_set and check with #subset?

gabteles commented 7 years ago

@mblumtritt What you suggest as a real world example? It would be nice if I generate a random-filled array of integers and let it static in the test or something like that?

@Arcovion Nice, I'll add both.

Thank you guys. (:

mblumtritt commented 7 years ago

@gabteles Do you mean something like this?

ARRAY1 = [*(400..500)].shuffle.freeze
ARRAY2 = [*(1..500)].shuffle.freeze

Then we have to other problem: how is the weight between the set and the subset 🤔 Or do you prefer a complete random set to compare?…

In "real world" I would check my chances to be faster with a complete different algorithm. For sample what about sorting first descending? This might be faster if both sets already nearly sorted…

I'm unsure to give a "general advice" for this subset problem. In "real world" it always depends… ;)

gabteles commented 7 years ago

@mblumtritt Yeah, I agree that the weight between set/subset is a big problem to benchmark. I'm not sure how to handle it.

Before I meant something like this:

Generate random arrays (generated with small size to better readability in this example, test could have bigger arrays):

$ irb
2.4.1 :001 > 10.times.map { rand(100) }
=> [20, 83, 86, 53, 63, 52, 50, 43, 18, 94] 
2.4.1 :002 > 10.times.map { rand(100) }
=> [27, 93, 84, 86, 71, 89, 38, 48, 37, 22]

Statically use it in code:

ARRAY1 = [20, 83, 86, 53, 63, 52, 50, 43, 18, 94].freeze
ARRAY2 = [27, 93, 84, 86, 71, 89, 38, 48, 37, 22].freeze

Using it statically grants that we won't have performance improved/degraded by randomness effect of shuffle or rand in runtime. But it also suffers of the weight problem.