IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
287 stars 128 forks source link

[Feature] Dynamic reading and writing to avoid failures due to network/IO system overload #307

Open dhirajjoshi16 opened 4 months ago

dhirajjoshi16 commented 4 months ago

Search before asking

Component

Library/core

Feature

Many-a-times, long running jobs get killed due to read/write failures owing to I/O overload. Read-writes are also constrained by network access such as network bandwidth etc.

In order to minimize long running jobs getting killed due to read/write failures owing to I/O overload, requesting a feature to incorporate dynamic reading and writing including:

Are you willing to submit a PR?

blublinsky commented 4 months ago

Random backoff mechanism to relieve I/O pressure (I/O spread factor). We already have 2 level retries Readjusting read/write rates based on COS/file system response. This is by far more complex. Not sure how realistic it is