IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
307 stars 134 forks source link

first cut at refactoring dpk_pdf2parquet #813

Open touma-I opened 3 days ago

touma-I commented 3 days ago

Why are these changes needed?

This is a first of a series of restructuring changes that are done to have each transform built as its own module (e.g. dpk_pdf2parquet) with a ray submodule (dpk_pdf2parquet.ray ).

Removed python and ray folders and keep Dockerfile.python and Dockerfile.ray remove pyproject.toml and Makefiles move python code under dpk_pdf2parquet move ray code under dpk_pdf2parquet/ray change import statement to include module name replace recursive Makefile and use targets from .make.cicd.targets adapt kfp_ray/Makefile and other make target

Related issue number (if any).

https://github.com/IBM/data-prep-kit/issues/774