IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
47 stars 22 forks source link

code2parquet fixes on domain/snapshot and document_id #347

Closed daw3rd closed 1 week ago

daw3rd commented 1 week ago

Restore domain and snapshot column creation. Set document_id column back to a uuid (was a hash). Update readme on configuration.

Why are these changes needed?

Restore behavior equivalent to tools/ingest2parquet.

Related issue number (if any).