IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
307 stars 134 forks source link

Crawler transform #797

Closed touma-I closed 6 days ago

touma-I commented 1 week ago

Why are these changes needed?

Implement crawler transforms using the dpi-connector API. This is based on the work done by the data sift but also had to add CLI in order to integrate with python runtime. This implementation uses the new layout for the transform using module name dpk_web2parquet

Related issue number (if any).

https://github.com/IBM/data-prep-kit/issues/751