IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
45 stars 22 forks source link

Support transform specific TransformStatistics aggregator #234

Open sapthasurendran opened 4 weeks ago

sapthasurendran commented 4 weeks ago

Search before asking

Component

Other

Feature

Need support to return a dictionary metadata from transform function. Also need a mechanism to overwrite TransformStatistics function to help iterate over the dictionary metadata and write the aggregated results back to s3.

Are you willing to submit a PR?

Bytes-Explorer commented 3 weeks ago

@daw3rd We want to prioritise this. Lets chat when you are back from vacation.

blublinsky commented 1 week ago

What exactly is the end goal here? What are we trying to do?

shivdeep-singh-ibm commented 1 week ago

Current TransformStatistics are numbers and don't support custom aggregation.

However it is possible to extend TransformRuntime to run addtional actors to collect this info.

shivdeep-singh-ibm commented 1 week ago

Closing this issue since we would extend TransformRuntime for the transform.