[X] I searched the issues and found no similar issues.
Component
Transforms/Other
Feature
The goal is to add a new transform that can take in the extracted text and chunk it. The input will be parquet files where every document is stored in one row. The output will be chunks, such that every chunk is stored in one row. Chunk size should be a parameter exposed to the user.
Search before asking
Component
Transforms/Other
Feature
The goal is to add a new transform that can take in the extracted text and chunk it. The input will be parquet files where every document is stored in one row. The output will be chunks, such that every chunk is stored in one row. Chunk size should be a parameter exposed to the user.
This new transform should be added along with other language modules here https://github.com/IBM/data-prep-kit/tree/dev/transforms/language
Are you willing to submit a PR?