Closed p-salido closed 7 years ago
@reneklacan added
I used following simple dummy script to get an approximate improvement in accuracy
require 'zlib'
RAND_BASE = (2**32 - 1) / 100.0
def impl_new(user_id, percentage)
Zlib.crc32(user_id.to_s) < RAND_BASE * percentage
end
def impl_old(user_id, percentage)
Zlib.crc32(user_id.to_s) % 100_000 < percentage * 1_000
end
COUNT = 1_000_000
dataset_1 = (1..COUNT).to_a
dataset_2 = COUNT.times.inject([7]) { |memo, i| memo + [memo[-1] + 7] }
dataset_3 = COUNT.times.inject([7]) { |memo, i| memo + [memo[-1] + rand(1..30)] }
test_cases = [
{ user_ids: dataset_1, percentage: 25 },
{ user_ids: dataset_1, percentage: 50 },
{ user_ids: dataset_1, percentage: 75 },
{ user_ids: dataset_2, percentage: 17 },
{ user_ids: dataset_2, percentage: 39 },
{ user_ids: dataset_2, percentage: 71 },
{ user_ids: dataset_3, percentage: 21 },
{ user_ids: dataset_3, percentage: 47 },
{ user_ids: dataset_3, percentage: 82 },
]
errors_old = []
errors_new = []
test_cases.each_with_index do |test_case, i|
user_ids = test_case.fetch(:user_ids)
percentage = test_case.fetch(:percentage)
real_percentage_old = user_ids.select { |uid| impl_old(uid, percentage) }.count.to_f / user_ids.count * 100
real_percentage_new = user_ids.select { |uid| impl_new(uid, percentage) }.count.to_f / user_ids.count * 100
errors_old << (real_percentage_old - percentage).abs
errors_new << (real_percentage_new - percentage).abs
puts
puts "Test Case #{i}:"
puts "- Expected: #{percentage}"
puts "- Real Old: #{real_percentage_old}"
puts "- Real New: #{real_percentage_new}"
end
puts
puts "---"
puts
puts "Average Old Error: #{errors_old.reduce(&:+)/errors_old.count}"
puts "Average New Error: #{errors_new.reduce(&:+)/errors_new.count}"
puts
puts "Total Old Error: #{errors_old.reduce(&:+)}"
puts "Total New Error: #{errors_new.reduce(&:+)}"
And the output is:
Test Case 0:
- Expected: 25
- Real Old: 24.9711
- Real New: 24.9995
Test Case 1:
- Expected: 50
- Real Old: 49.9973
- Real New: 49.9989
Test Case 2:
- Expected: 75
- Real Old: 75.00489999999999
- Real New: 74.9995
Test Case 3:
- Expected: 17
- Real Old: 16.931783068216934
- Real New: 17.000182999817
Test Case 4:
- Expected: 39
- Real Old: 38.93256106743893
- Real New: 39.003360996639
Test Case 5:
- Expected: 71
- Real Old: 70.92442907557093
- Real New: 71.01422898577101
Test Case 6:
- Expected: 21
- Real Old: 20.985379014620985
- Real New: 21.039978960021042
Test Case 7:
- Expected: 47
- Real Old: 47.019352980647014
- Real New: 47.02295297704702
Test Case 8:
- Expected: 82
- Real Old: 82.01721798278201
- Real New: 82.03641796358202
---
Average Old Error: 0.03321319306458213
Average New Error: 0.013246986986344843
Total Old Error: 0.2989187375812392
Total New Error: 0.11922288287710359
So quite a nice improvement.
@p-salido Thank you for your contribution!
Small improvement for #58