NHSDigital / ndr_import

National Disease Registers import ETL gem
MIT License
5 stars 7 forks source link

deep_merge with standard mappings? #10

Closed timgentry closed 5 years ago

timgentry commented 7 years ago

Without reading the documentation on standard mappings, one would assume standard mappings were deep merged, so that field mappings are merged with the standard mapping field mappings rather than replacing them.

Should we therefore make this change? It would be non-breaking with mappings that currently need to redefine the standard mappings field mapping(s) so they remain.

joshpencheon commented 7 years ago

(See #1 for discussion on mutating standard mappings.)

timgentry commented 7 years ago

We should not mutate standard_mappings. Neither reverse_merge nor deep_merge would work as it only merges nested Hashes (and "mappings" is an Array)

joshpencheon commented 7 years ago

The general approach taken for the "deep clone" previously was to do a roundtrip through some serialisation format, e.g.

clone = YAML.load(YAML.dump(original))

Or it might be possible to use Marshal in a similar way.

I'm not aware of an out-of-the-box way of merging together richer structures.

timgentry commented 7 years ago

I guess Marshal is faster? Given the clearly defined structure of a column mapping I think we'd have to roll our own merge function.

timgentry commented 7 years ago
require 'benchmark'
require 'yaml'

n = 50_000
original = YAML.load <<-YML
  - column: surname
    rawtext_name: surname
    mappings:
    - field: surname
      clean: :name
  - column: forename
    rawtext_name: forenames
    mappings:
    - field: forenames
      clean: :name
  - column: sex
    rawtext_name: sex
    mappings:
    - field: sex
      clean: :sex
  - column: nhs_no
    rawtext_name: nhsnumber
    mappings:
    - field: nhsnumber
      clean: :nhsnumber
YML

Benchmark.bm do |x|
  x.report("Marshal:") do
    n.times { Marshal.load(Marshal.dump(original)) }
  end
  x.report("YAML:") do
    n.times { YAML.load(YAML.dump(original)) }
  end
end
       user     system      total        real
Marshal:  2.450000   0.020000   2.470000 (  2.561565)
YAML: 37.170000   0.160000  37.330000 ( 37.833326)
joshpencheon commented 5 years ago

@timgentry do you think there is more to work on here? I'm conscious that as of 582ce42b (and indeed, prior to it), additional mappings supplied alongside a standard_mapping are merged in, rather than replacing any mappings that the standard_mapping predefines. All other properties are replaced.

timgentry commented 5 years ago

I think it is now a non-issue, we can always reopen it if necessary.