AbsaOSS / enceladus

Dynamic Conformance Engine
Apache License 2.0
29 stars 14 forks source link

Std/Conf/Std+Conf jobs cannot be run twice as a lib #2165

Closed dk1844 closed 1 year ago

dk1844 commented 1 year ago

Describe the ~bug~ problem

Current implementation of Enceladus Spark jobs internally uses Atum's initialization and does not explicitly disable Atum's control measurement tracking, because it relies on the implied disable routine with Spark session being ended.

However, if one of these sparkjobs is ran from other code (library-like), Atum's disabling of the CM tracking is not called and Control framework tracking is already initialized. exception is raised.

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. run StandardizationJob.main() from other code more than once.
  2. See error Control framework tracking is already initialized.

Expected behavior

Running sparkjobs multiple times from other code should work.

Additional context

If this way of using is to be supported, explicit spark.disableControlMeasuresTracking() for Atum must be called at the end of all Enceladus SparkJobs

Temporary Workaround

Until fixed and released, when using Enceladus spark jobs in this as-a-library fashion, one can explicitly call

spark.disableControlMeasuresTracking()

between individual jobs.

benedeki commented 1 year ago

I am not sure it's a bug, rather an improvement (after all, it was never intended to run this way), but we can keep the designation... 😉