AbsaOSS / enceladus

Dynamic Conformance Engine
Apache License 2.0
31 stars 14 forks source link

Streaming Conformance too slow when just 20 rules are used #1345

Open yruslan opened 4 years ago

yruslan commented 4 years ago

Describe the bug

Streaming Conformance takes way too much time to warm up (20 minutes) and to process. a single micro-batch (8 minutes).

This is way too slow.

This happens due to the catalyst issue reported to Spark: https://issues.apache.org/jira/browse/SPARK-28090

And we have a workaround for batch: #190, #413

The job hangs completely if no workarounds are used: #1306

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. Create a schema with nested arrays of structs.
  2. Create a dataset with 20 conformance rules or so. Some of the rules should operate inside an array.
  3. Run streaming conformance.

Expected behaviour

Since all transformations are just projections the performance of a conformance job should be good.

Additional context

Ideas of a solution:

yruslan commented 4 years ago

Seems this is not so critical since only startup time is affected.