Closed timgentry closed 5 years ago
(See #1 for discussion on mutating standard mappings.)
We should not mutate standard_mappings. Neither reverse_merge nor deep_merge would work as it only merges nested Hashes (and "mappings" is an Array)
The general approach taken for the "deep clone" previously was to do a roundtrip through some serialisation format, e.g.
clone = YAML.load(YAML.dump(original))
Or it might be possible to use Marshal
in a similar way.
I'm not aware of an out-of-the-box way of merging together richer structures.
I guess Marshal
is faster? Given the clearly defined structure of a column mapping I think we'd have to roll our own merge function.
require 'benchmark'
require 'yaml'
n = 50_000
original = YAML.load <<-YML
- column: surname
rawtext_name: surname
mappings:
- field: surname
clean: :name
- column: forename
rawtext_name: forenames
mappings:
- field: forenames
clean: :name
- column: sex
rawtext_name: sex
mappings:
- field: sex
clean: :sex
- column: nhs_no
rawtext_name: nhsnumber
mappings:
- field: nhsnumber
clean: :nhsnumber
YML
Benchmark.bm do |x|
x.report("Marshal:") do
n.times { Marshal.load(Marshal.dump(original)) }
end
x.report("YAML:") do
n.times { YAML.load(YAML.dump(original)) }
end
end
user system total real
Marshal: 2.450000 0.020000 2.470000 ( 2.561565)
YAML: 37.170000 0.160000 37.330000 ( 37.833326)
@timgentry do you think there is more to work on here? I'm conscious that as of 582ce42b (and indeed, prior to it), additional mappings
supplied alongside a standard_mapping
are merged in, rather than replacing any mappings
that the standard_mapping
predefines. All other properties are replaced.
I think it is now a non-issue, we can always reopen it if necessary.
Without reading the documentation on standard mappings, one would assume standard mappings were deep merged, so that field mappings are merged with the standard mapping field mappings rather than replacing them.
Should we therefore make this change? It would be non-breaking with mappings that currently need to redefine the standard mappings field mapping(s) so they remain.