IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
248 stars 122 forks source link

[Feature] Enable transform() to terminate all processing of documents across all instances #437

Closed daw3rd closed 1 month ago

daw3rd commented 3 months ago

Search before asking

Component

Library/core

Feature

Today we allow/expect exceptions to be thrown in a tranforms initi() and transform(). In the former case, this terminates all processing. In the latter case, this only serves to skip the current file being processed. Per issue #430 resize needs to be able to stop processing of all files if a schema mismatch is encountered. This can be done by having a special exception thrown from transform() that would cause the runtime to shutdown all future processing. This likely involves signalling the (ray) orchestrator to shutdown. Not sure how/if this can be handled in spark and python runtimes.

Are you willing to submit a PR?

blublinsky commented 3 months ago

Done

daw3rd commented 1 month ago

This is fixed after 0.2.0 with new UnrecoverabledException.