IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
236 stars 122 forks source link

Run KFPv1 pipeline on a large real cluster with large datasets #203

Closed shahrokhDaijavad closed 4 months ago

shahrokhDaijavad commented 5 months ago

Search before asking

Component

Library/kfp

Feature

This is testing KFPv1 on a real cluster like fmc-preprocessing that can handle a large number of datasets

Are you willing to submit a PR?

shahrokhDaijavad commented 5 months ago

This is about testing the scalability of kfpV1 on a real and large cluster like fmc-preprocessing.

shahrokhDaijavad commented 5 months ago

Learned from Yuan-Chi and Hamid about the on-going fmc-preprocessing cluster runs our team is conducting. Then, I submitted an ededup run on that cluster with a datasize of about 600 files that ran successfully in about one hour.