AbsaOSS / hyperdrive

Extensible streaming ingestion pipeline on top of Apache Spark
Apache License 2.0
44 stars 13 forks source link

Create deterministic logical plan #265

Closed kevinwallimann closed 2 years ago

kevinwallimann commented 2 years ago

Currently, ConfluentAvroDecodingTransformer uses randomly generated UUIDs as column names to prevent column name collisions.

This introduces randomness into the generated logical plan. The logical plans can differ across runs, even if they are logically the same. That can be a problem e.g. for Spline.

It should be enough to have a static UUID to avoid column name collisions.