IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
171 stars 111 forks source link

[Feature] Enable pure python transforms in new spark runtime. #586

Open daw3rd opened 2 weeks ago

daw3rd commented 2 weeks ago

Search before asking

Component

Transforms/Other

Feature

With the recent PR to the spark runtime, to allow checkpointing and pure-python transforms to run in spark, we should begin enabling transforms listed below to run in the spark runtime. Largely this means creating the boilerplate main() and associated Spark configuration classes. See the new filter transform for an example.

The transforms suggested this work are as follows:

Universal

Code

Language

Are you willing to submit a PR?