AbsaOSS / enceladus

Dynamic Conformance Engine
Apache License 2.0
30 stars 14 forks source link

IDGenerationStrategy setting is ignored for Standardization #2163

Closed dk1844 closed 1 year ago

dk1844 commented 1 year ago

Describe the bug

IDGenerationStrategy setting enceladus.recordId.generation.strategy with values uuid, stableHashId, none (with uuid being the default) is available in Enceladus and expected to be used for both Standardizaton and Conformance.

The default (uuid) works as expected. However, attempting to set non-default value (e.g. stableHashId) seems to result for this setting to be only respected in the Conformance phase, while Standardization running with the default uuid setting (effectively ignoring the setting).

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. set enceladus.recordId.generation.strategy='stableHashId in config or via -D
  2. run std & conf
  3. watch UUIDs being generated instead of stableHashId (integer values) being present for Std phase
  4. (note that std/conf do not overwrite existing enceladus_record_id column)

Expected behavior

non-default enceladus.recordId.generation.strategy setting should be respected in both std & conf phases.

Additional context

Regression of this behavior may have appeared when the spark-data-standardization library was extracted. Thus consider:

  1. remove any leftover code duplication (RecordIdGeneration)
  2. add tests preventing from such a regression in the future