IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
262 stars 123 forks source link

Build a new transform to automate crawling and then convert to parquet #751

Open shahrokhDaijavad opened 1 week ago

shahrokhDaijavad commented 1 week ago

Search before asking

Component

Transforms/Other

Feature

This is using the DPK-connector lib in the repo (https://github.com/IBM/data-prep-kit/tree/dev/data-connector-lib) which is available as a stand-alone pip install now. This is about making this a "data ingestion" transform with parquet output that can easily be fed into other DPK transforms.

Are you willing to submit a PR?

touma-I commented 1 day ago

@touma-I to get the code from the developers and adapt it to the DPK transform and submit a PR.

Bytes-Explorer commented 23 hours ago

Request for consideration as we build this transform:

  1. Simple API call with minimal lines
  2. Explain what parameters are there in the API and how to use it in the readme
  3. Request from Sujee on storing data

Image