Nike-Inc / koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
https://engineering.nike.com/koheesio/
Apache License 2.0
599 stars 19 forks source link

[FEATURE] Quarantine transformation #35

Open mikita-sakalouski opened 4 months ago

mikita-sakalouski commented 4 months ago

Is your feature request related to a problem? Please describe.

Quarantine transformation can be used to split your incoming data based on custom logic and write "bad" part to different table, while propagating good part to the next koheesio Step.

Describe the solution you'd like

Should be pretty simple, but customizable pyspark transformation with splitting logic.

Describe alternatives you've considered

We should Delta Live Tables feature for the same issue, which is providing more functionality (metrics, stats and etc.)