evilmartians / evil-seed

A Gem for creating partial anonymized dumps of your database using your app model relations.
MIT License
447 stars 19 forks source link

Unnamed implementation 1 #1

Closed Envek closed 7 years ago

Envek commented 7 years ago

Can't figure out good name for this PR :-)

What is good:

  1. It works
  2. It have README

What is not:

  1. Test coverage is very poor
  2. The EvilSeed::RelationDumper class which does all the power-lifting needs to be refactored
  3. Dumping HABTM is not ideal yet

Please review

Envek commented 7 years ago
  1. Will add dump-restore test case in few days.
  2. I do not resolve association cycles: I will traverse them infinitely while there are records to load. Duplicates are handled by loaded_map hash which stores primary keys for every table. Record and it's associations will not be loaded if its primary key exists there.
  3. &block is slower, you can see comparison at http://stackoverflow.com/a/40608104/338859 . I will look where I can replace &block to yield because most of the blocks are data transformers and being called for each tuple of data.
Envek commented 7 years ago

On yield vs &block.

The only place where &block can be changed to yield and it's called frequently (for every dumped record) is lib/evil_seed/configuration.rb:26:

diff --git a/lib/evil_seed/configuration.rb b/lib/evil_seed/configuration.rb
index 0979964..4f91712 100644
--- a/lib/evil_seed/configuration.rb
+++ b/lib/evil_seed/configuration.rb
@@ -25,5 +25,5 @@ module EvilSeed

-    def customize(model_class, &block)
-      raise(ArgumentError, "You must provide block for #{__method__} method") unless block
-      customizers[model_class.to_s] << ->(attrs) { attrs.tap(&block) } # Ensure that we're returning attrs from it
+    def customize(model_class)
+      raise(ArgumentError, "You must provide block for #{__method__} method") unless block_given?
+      customizers[model_class.to_s] << ->(attrs) { attrs.tap { yield attrs } } # Ensure that we're returning attrs
     end

Benchmark for (I believe) identical code:

require 'securerandom'

def build_c1
  ->(attrs) { attrs.tap { yield attrs } }
end
def build_c2(&block)
  ->(attrs) { attrs.tap(&block) }
end
c1 = build_c1 do |attrs|
  attrs[:foo] = SecureRandom.uuid # Not synthetic test but approx. same as real world usage™
end
c2 = build_c2 do |attrs|
  attrs[:foo] = SecureRandom.uuid # Not synthetic test but approx. same as real world usage™
end

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("yield") { c1.call({}) }
  x.report("&block") { c2.call({}) }
  x.compare!
end

Results:

Warming up --------------------------------------
               yield    10.846k i/100ms
              &block    10.746k i/100ms
Calculating -------------------------------------
               yield    114.485k (± 8.3%) i/s -    574.838k in   5.056315s
              &block    116.087k (± 7.5%) i/s -    580.284k in   5.028676s

Comparison:
              &block:   116087.0 i/s
               yield:   114485.4 i/s - same-ish: difference falls within error

So there is no real reason to change &block to yield and for me &block feels more readable (and that is more important IMHO).

palkan commented 7 years ago

I guess, these benchmarks show that SecureRandom.uuid takes most of a time; that's why there is no significant difference.

As I've already mentioned, we don't need micro-optimizations here, so let's just leave &block everywhere.